Reputation: 627
Can anyone recommend a way that I can do this Python code as a MongoDB query?
import pandas as pd
data = pd.read_csv("elonmusk.csv")
from collections import Counter
Counter(" ".join(data["tweet"]).split()).most_common(100)
I am looking for help to write a MongoDB query that can create a similar output as the Python code shown here.
Analysing all the text of one field and returning the most common words.
I believe MongoDB word cloud link here has a similar solution https://docs.mongodb.com/charts/saas/chart-type-reference/word-cloud/ However I have to write the code in the MongoDB shell.
I was not sure how to apply the following Stackoverflow solution in this link Most frequent word in MongoDB collection
Thanks in advance for any advice.
Upvotes: 1
Views: 635
Reputation: 8705
Query
$set
to fix structure (you can use project also its the same)aggregate(
[{"$match": {"$expr": {"$eq": [{"$type": "$tweet"}, "string"]}}},
{"$set":
{"matches": {"$regexFindAll": {"input": "$tweet", "regex": "\\w+"}}}},
{"$project": {"tokens": "$matches.match"}},
{"$unwind": {"path": "$tokens"}},
{"$group": {"_id": "$tokens", "count": {"$sum": 1}}},
{"$set": {"word": "$_id", "_id": "$$REMOVE"}},
{"$sort": {"count": -1}},
{"$limit": 100}])
Upvotes: 1