NoBody
NoBody

Reputation: 1

Tag a Sentence based on Tagged Sentences

I am creating a system capable of tag a sentence based on a previous tagged sentences. I have a corpora with the structure as the Known Questions.

Known Questions:

city_name What are the most popular city in spain?

amount_of_people How many people are in the city center?

New questões:

What are the most popular city in Italy?

How many people are in the at the stadium?

What is the nearest city to New York?

Example of tags:

city_name

amount_of_people

desired result:

city_name What are the most popular city in Italy?

amount_of_people How many people are in the at the stadium?

city_name What is the nearest city to New York?

I have in total 30 tags and 350 Senteces. is there any python framework or an known algorithm to analyze the corpora and tag a new sentence base on the corpora ?

Upvotes: 0

Views: 232

Answers (1)

darthbhyrava
darthbhyrava

Reputation: 515

Typically, this ought to be treated as the machine learning task of classification. You could use any number of approaches starting from Naive Bayes to Multilayer Perceptron to softmax based DNNs. I would strongly suggest using one of the above for such tasks, but given that you have only 350 questions, I have no idea if classifiers can learn on so little data without experimenting.

However, if you wish to approach this using rule-based methods, I would suggest using dependency parsing. This is under the assumption that all your data consists of grammatically well-formed questions, and that there is a semantic relation between the tags and the questions.

Let's use the Stanford Dependency Parser here for the question: What is the nearest city to New York?.

UNIVERSAL DEPENDENCIES:

root(ROOT-0, What-1)
cop(What-1, is-2)
det(city-5, the-3)
amod(city-5, nearest-4)
nsubj(What-1, city-5)
case(York-8, to-6)
compound(York-8, New-7)
nmod(city-5, York-8)

As you can see, the nsubj (nominal subject) tells us about the relation between 'What' and 'city' (understand more about the dependencies here). So every time, you have 'city' as nsubj of 'what' (say), you could allocate the city_name tag to the question.

Similarly, if 'people' is in the nsubj of a question (and 'many' in the amod), maybe you could allocate the amount_of_people tag to that question.

You'd have to observe and find the best dependency based rules for each of the 30 tags in a similar manner, and that should definitely do the trick.

Upvotes: 0

Related Questions