Reputation: 61
I'm trying to figure out what are the ways (and which of them the best one) of extraction of Values for predefined Keys in the unstructured text?
Input:
Key list: ['drug', 'name', 'weather']
Output:
['drug=favipiravir', 'drug=nazivin', 'name=Yury', 'weather=cold']
So, as you can see, in the 3d sentence there is no explicit key 'name' and therefore no value extracted (I think there is the difference with NER). At the same time, 'drug' and 'medicine' are synonyms and we should treat 'medicine' as 'drug' key and extract the value also.
And the next question, what if the key set will be mutable? Should I use as a base regexp approach because of predefined Keys or there is a way to implement it with supervised learning/NN? (but in this case how to deal with mutable keys?)
Upvotes: 3
Views: 2392
Reputation: 421
You can use a parser to tag words. Your problem is similar to Named Entity Recognition (NER). A lot of libraries, like NLTK
in Python, have POS taggers available. You can try those. They are generally trained to identify names, locations, etc. Depending on the type of words you need, you may need to train the parser. So you'll need some labeled data also.
Check out this link:
https://nlp.stanford.edu/software/CRF-NER.html
Upvotes: 2