Reputation: 97
I am using GATE in one of my applications and I have few queries related to Multi-tenancy. My requirements are as given below.
I searched the web for few samples and though I found some thoughts on working with Processing Resource, I have no idea how the performance will be affected.
Your help is much appreciated.
Thanks in advance,
Sajith
Upvotes: 0
Views: 66
Reputation: 1683
That's an interesting use-case for a GATE gazetteer.
One thing I believe you should definitely do is add the user ID as a feature when you're creating the document. This way you'll be able to make your MongoDB query in a processing resource later on.
When you're processing the document, you have several options:
Create a custom PR which calls MongoDB and replicates the DefaultGazetteer code but with overwritten "init" method (or inherit or wrap it, haven't looked into much detail if that's possible). Instead of the default init method you should provide your list of keywords, then set the needed fields and call execute().
If you don't have too many keywords, create a custom PR (or groovy scripting PR) which calls MongoDB and does some simple regex search like the one in this thread. They also suggest the stringsearch library in the comments. Then just use start and end indices to create Lookup annotations on your own.
You said you don't want that but still, several million words can be handled by both the default and the Hash gazetteer. Although, you should be careful as gate documents could be very memory-intensive if you have too many annotations - in your case Lookups for all user keywords.
Hope this helps.
Upvotes: 1