Reputation: 43
I've been streaming data from twitter into a mongo database. However I found that I hadn't formatted the search incorrectly, so I got data from all over the place instead of the one city I wanted (I find location by checking if the city name comes up in 'location' or 'name' under 'user' in the json).
I want to copy just the correct documents to a new collection, but I've found it nearly impossible to do in pymongo! I'm using pymongo instead of the shell because I'm using regular expressions to search for the city names(there's a lot of synonyms for it).
regex=re.compile(<\really long regular expression of city names>)
I've been able to use find() correctly with the regular expressions; it returns just what I'm looking for:
db.coll.find({'$or':[{'user.location':{'$in':[regex]}},{'user.name':{'in':[regex]}}]})
I just need to copy what it returns into a new collection, but it's proving difficult.
I tried this method, trying forEach() to try to copy the documents, using bson wrapping, which I found here, but it still won't work.
db.coll.find({'$or':[{'user.location':{'$in':[regex]}},{'user.name':{'in' [regex]}}]})\
.forEach(bson.Code( '''
function(doc) {
db.subset.insert(doc);
}'''))
Specifically, the error I get when I try this is
I have no idea what is wrong or how I can go about fixing this. Anyone able to tell me what I can do to fix this, or a better way to copy documents to a new collection?
Upvotes: 0
Views: 2508
Reputation: 2636
A cursor is already able to go through the results you don't need to forEeach. Try
for tweet in db.coll.find({'$or':[{'user.location':{'$in':[regex]}},{'user.name':{'in' [regex]}}]}):
db.subset.insert(tweet)
Upvotes: 1