user1513994
user1513994

Reputation: 11

Neo4j, storing text data in node properties, text analysis & full-text search a requirement

Is it ok to store text data in the graph nodes when text analysis will be a requirement?

I have an application in mind involving thousands documents that are interlinked through subject, author, references etc.. I want to store the links between the documents but also be able to analyse the text of the documents using text analysis techniques, text analysis will also require analysing the text of documents on all nodes to arrive at word counts etc.

At the moment I've researched a number to options trying to arrive at the best/most practical:

  1. Use a relational database technology with bridge tables to manage relationship information (Con: The SQL queries to "traverse" the relationships will be difficult)
  2. Use a graph database technology to store the relationship and document information (Cons: Graph databases aren't optimal for text storage and retrieval, worried that trying to run full-text analysis across all nodes will be slow and difficult to use with text analysis frameworks),
  3. Use a graph database to store the relationships and another such as CouchDB to store the document information (Cons: Managing the two stores and keeping them in sync),
  4. Use only a graph database to store the relationships and store the documents on disk or in HDFS etc. for analysis.
  5. Other?

Can anyone suggest if one or other of these is the best approach to implement?

Thanks,

Paul

Upvotes: 1

Views: 960

Answers (1)

p3rnilla
p3rnilla

Reputation: 351

Neo4js default index provider (Lucene) can do some text analytics. If that is not enough then 3 or 4 is prob the best.

http://lucene.apache.org/

Upvotes: 1

Related Questions