Reputation: 8160
Is there a huge CSV/XML or whatever file somewhere that contains a list of english verbs and their variations (e.g sell -> sold, sale, selling, seller, sellee)?
I imagine this will be useful for NLP systems, but there doesn't seem to be a listing anywhere, or it could be my terrible googling skills. Does anybody have a clue otherwise?
Upvotes: 6
Views: 4113
Reputation: 2942
Consider Catvar:
A Categorial-Variation Database (or Catvar) is a database of clusters of uninflected words (lexemes) and their categorial (i.e. part-of-speech) variants. For example, the words hunger(V), hunger(N), hungry(AJ) and hungriness(N) are different English variants of some underlying concept describing the state of being hungry. Another example is the developing cluster:(develop(V), developer(N), developed(AJ), developing(N), developing(AJ), development(N)).
Upvotes: 4
Reputation: 9256
Considering getting a dump of wiktionary and extracting this information out of it.
http://en.wiktionary.org/wiki/sell mentions many of the forms of the word (sells, selling, sold).
If your aim is simply to normalize words to some base canonical form, considering using a lemmatizer or stemmer. Trying playing with morpha which is a really good english lemmatizer.
Upvotes: 1
Reputation: 59
I am not sure what you are looking for but I think WordNet
-- a lexical database for the English language -- would be a good place to start. Read more at http://wordnet.princeton.edu/
The link I referred to you says that
WordNet's structure makes it a useful tool for computational linguistics and natural language processing.
Upvotes: 3