Reputation: 2625
I have a large database with a lot of entries (most of them movies) which has only description as information. A description of the entry with ID 1 (for example) may be like:
'Forrest Gump is a 1994 American epic romantic-comedy-drama film based on the 1986 novel of the same name by Winston Groom. The film was directed by Robert Zemeckis and stars Tom Hanks, Robin Wright, Gary Sinise, Mykelti Williamson, and Sally Field.'
Now I have also some txt documents that are basically dictionaries, and are structured like this:
actors.txt
Mickey Mouse
Tom Hanks
...
directors.txt
Donald Duck
Robert Zemeckis
...
What I want to do is to analyse the description of every entry and parse named entities from my dictionary. So if the text contain 'Tom Hanks' I want to recognize that the entry with ID 1 has Tom Hanks as actor and so on. An output should be something like that:
Actor: Tom Hanks, Actor: Robin Wright, Director: Robert Zemeckis, Distributor: Paramount Pictures.
or whatever format easy to manipulate.
Upvotes: 1
Views: 628
Reputation: 64
All you got to do is use SOLR, setup a few new fieldtypes(like text_actors) in its schema which are linked to appropriate dictionaries, write the appropriate schema, and then import the database. From what I know, this can help you develop a searchable database from which you can query all the results and populate your own database.
Upvotes: 1