WildBill
WildBill

Reputation: 9291

How to add data to mongoDB in python

I am a python noob (working with it for less than a few hours). I'm trying to read in twitter data and store it in a mongo database, but I am getting the following error:

Traceback (most recent call last):
  File "twit_test.py", line 8, in on_receive
    db.posts.insert(data)
  File "/Library/Python/2.6/site-packages/pymongo-2.0.1-py2.6-macosx-10.6-universal.egg/pymongo/collection.py", line 274, in insert
  File "/Library/Python/2.6/site-packages/pymongo-2.0.1-py2.6-macosx-10.6-universal.egg/pymongo/database.py", line 249, in _fix_incoming
  File "/Library/Python/2.6/site-packages/pymongo-2.0.1-py2.6-macosx-10.6-universal.egg/pymongo/son_manipulator.py", line 73, in transform_incoming
TypeError: 'str' object does not support item assignment
Traceback (most recent call last):
  File "twit_test.py", line 17, in <module>
    conn.perform() 

My code is very simple:

import pycurl, json
import pymongo

STREAM_URL = "https://stream.twitter.com/1/statuses/sample.json"
USER = "XXXXXXXX"
PASS = "XXXXXXXX"
def on_tweet(data):
  tweet = json.loads(data)
  db.posts.insert(tweet)

from pymongo import Connection
connection = Connection()
db = connection.test
conn = pycurl.Curl()
conn.setopt(pycurl.USERPWD, "%s:%s" % (USER, PASS))
conn.setopt(pycurl.URL, STREAM_URL)
conn.setopt(pycurl.WRITEFUNCTION, on_tweet)
conn.perform() 

I'm sure this is a VERY simple fix, hope you guys can help. Thanks!

Upvotes: 3

Views: 7677

Answers (3)

WildBill
WildBill

Reputation: 9291

The above edits/current code works. I was incorrectly querying the DB and expecting to see more traffic through the mongo console than I did.

Thanks much to the guys who helped, you got me on teh right track and to the right answer!

Upvotes: 0

dcrosta
dcrosta

Reputation: 26258

PyMongo's insert method takes a dictionary, not a string. The error you're seeing is where PyMongo attempts to assign an ObjectId for the new record (since it doesn't yet have one) before sending to the database.

I think the error is in your on_receive function. Unless pycurl is converting the JSON for you automatically, it's very likely just giving you a raw string result from twitter's API. You should use the json module to decode the string, then handle the resulting type appropriately -- that is, if it's an array, iterate each item, determine whether it needs to be saved (i.e. whether you already have it in your database), and if not, then issue insert just on those elements which are new.

EDIT: You should also add the safe=True keyword argument to insert. If there is an error that is caught on the server side, you will then get an exception from PyMongo which will help diagnose the problem.

Upvotes: 2

moliware
moliware

Reputation: 10278

On receive you have to buffer the content. When a "\r\n" comes, then you get a tweet and it can be stored in mongodb

def on_tweet(data):
    tweet = json.loads(data)
    db.posts.insert(tweet)


 buffer = ""

 def on_receive(data):
     buffer += data.strip()         
     if (data.endswith("\r\n")):    
         if buffer: 
             on_tweet(buffer)
         buffer = ""

EDIT : I though you were using old streaming api. "on_tweet" function should be enough

Upvotes: 2

Related Questions