Reputation: 796
I would like to perform some natural language processing on cooking recipes, in particular the ingredients (perhaps preparation later on). Basically I am looking to create my own set of POS tags to help me determine the meaning of an ingredient line.
For example, if one of the ingredients was: 3/4 cup (lightly packed) flat-leaf parsley leaves, divided
I would want tags to express the ingredient being listed and the quanitity, which is usually a number followed by some unit of measurement. For example:
3\NUM-QTY/\FRACTION4\NUM-QTY cup\N-MEAS (lightly\ADV packed\VD) [flat-leaf\ADJ parsley\N]\INGREDIENT leaves\N, divided\VD
The tags I found here.
I am uncertain about a few things:
I feel like this language processing is so specific that it would be beneficial to train a tagger on an applicable set, but I'm not exactly sure how to proceed.
Thanks!
Upvotes: 5
Views: 314
Reputation: 5363
Use pattern.search library.
The python pattern library supports many tags[1] , including a cardinal number tag(CD).
Once you have tagged cardinals , fractions are "cardinal/cardinal" or something like "cardinal cardinal/cardinal".
And regarding quantities , you should build a taxonomy of cooking quantities. the python pattern library also support lemmatization[2].
I think using pattern.search[2] you could build a Constraint that would fit your data, and do pattern searches on text using it.
[1]http://www.clips.ua.ac.be/pages/mbsp-tags [2]http://www.clips.ua.ac.be/pages/pattern-search
Upvotes: 3