abroekhof
abroekhof

Reputation: 796

Hand tagging a training set with customized tags

I would like to perform some natural language processing on cooking recipes, in particular the ingredients (perhaps preparation later on). Basically I am looking to create my own set of POS tags to help me determine the meaning of an ingredient line.

For example, if one of the ingredients was: 3/4 cup (lightly packed) flat-leaf parsley leaves, divided

I would want tags to express the ingredient being listed and the quanitity, which is usually a number followed by some unit of measurement. For example:

3\NUM-QTY/\FRACTION4\NUM-QTY cup\N-MEAS (lightly\ADV packed\VD) [flat-leaf\ADJ parsley\N]\INGREDIENT leaves\N, divided\VD

The tags I found here.

I am uncertain about a few things:

  1. Should I be using custom tags, or should I be doing some sort of post-tagging processing after using a pre-existing tagger?
  2. If I do use custom tags, is the best way to make a training text to just go through an ingredient list and tag everything by hand?

I feel like this language processing is so specific that it would be beneficial to train a tagger on an applicable set, but I'm not exactly sure how to proceed.

Thanks!

Upvotes: 5

Views: 314

Answers (1)

Alex Brooks
Alex Brooks

Reputation: 5363

Use pattern.search library.

The python pattern library supports many tags[1] , including a cardinal number tag(CD).

Once you have tagged cardinals , fractions are "cardinal/cardinal" or something like "cardinal cardinal/cardinal".

And regarding quantities , you should build a taxonomy of cooking quantities. the python pattern library also support lemmatization[2].

I think using pattern.search[2] you could build a Constraint that would fit your data, and do pattern searches on text using it.

[1]http://www.clips.ua.ac.be/pages/mbsp-tags [2]http://www.clips.ua.ac.be/pages/pattern-search

Upvotes: 3

Related Questions