Arturo Zamora
Arturo Zamora

Reputation: 31

How to identify patterns inside a text and categorize them

From a table that stores medicine descriptions I need to identify the product name, strength, product quantity and pharmaceutical company of each entry. The goal is to have a copy of the table with a predefined structure.

Current table: current table

Normalized table: normalized table

So far I've read a little of Natural Language Processing, but I want to know another approach; I was thinking of using Regex but there are plenty of cases.

Any kind of insight would be appreciated.

Upvotes: 3

Views: 938

Answers (1)

polm23
polm23

Reputation: 15593

Based on your examples, your data is regular enough regexes might be a good approach. A more sophisticated approach you can try is Named Entity Recognition (NER). The New York Times used CRF++ to extract ingredient information from recipes and wrote about it here.

NER Example

Upvotes: 2

Related Questions