Reputation: 1222
I need to analyze a users' post and categorize it. For example: I have to categorize every post as a "buy" post or a "sell" post based on the text - "I'm looking to sell my house" is categorized as "sell". The problem is that often its not so simple - "I'm looking to get rid of my old house" also needs to be categorized as "sell". "I'm looking for a house" becomes "buy". I also would like to categorize these posts based on the item in question - for example, the post above would be categorized as "buy" and as "house".
Can anyone recommend a good approach / good framework / technique when it comes to analyzing and understanding user input? Thanks.
Upvotes: 1
Views: 1163
Reputation: 23503
You're right; it's a hard thing to do.
Yahoo! has a Term Extraction API/Web service you can use. It's a pretty good way to use language analysis on your own text without writing a million lines of code to do it yourself. I haven't used it, so I've no idea how well it works with similar meanings, as your question asks.
Upvotes: 2
Reputation: 625097
What you're talking about is basically a Bayesian filtering problem, also used for spam filtering. See also this talk. It's a reasonably complicated area.
Upvotes: 3