VK1
VK1

Reputation: 300

(Beginner to) NLP:I am trying to understand how I can categorise words in text to identify all the words related to a topic

I have scraped a website using BeautifulSoup and now I want to analyse all the text that I have scraped and create a long-list of food items that occur in that piece of text.

Example text

If you’re a vegetarian and forever lamenting the fact that you can’t have wontons, these guys are for you! The filling is made with a simple mix of firm tofu crumbles, seasoned with salt, ginger, white pepper, and green onions. It’s super simple but so satisfying. Make sure you drain your tofu well and dry it out as much as possible so that the filling isn’t too wet. You can even go a step further and give it a press: line a plate with paper towels, the put some paper towels on top and weigh the tofu down with another plate. The best thing about these wontons is that the filling is completely cooked so you can adjust the seasoning just by tasting. Just make sure that the filling is slightly more saltier than you would have it if you were just eating it on it’s own. Wonton wrappers don’t have much in the way of seasoning. These guys cook up in a flash because all you’re doing is cooking the wonton wrappers. Once you pop them in the boiling water and they float to the top, you’re good to go. Give them a toss in a spicy-soy-vinegar dressing and you’re in heaven!

I would like to create a long list from this which identifies: wontons, tofu, vinegar, white pepper, onions, salt

I am not sure how I can do this without having a pre-existing list of food items. Therefore, any suggestions would be great. Looking for something which can do this automatically without too much manual intervention! (I am quite new to NLP and deep learning and so any articles/ methods you recommend would be super useful!)

Thanks!

Upvotes: 1

Views: 62

Answers (1)

singhV
singhV

Reputation: 157

If you are new in this field you can use the GENSIM, a free python library for topic modeling.You can extract the food items using Latent Semantic Analysis or Similarity Queries.

https://radimrehurek.com/gensim/index.html

Upvotes: 1

Related Questions