Python: Import Tweet unicode data to pandas data frame object

Question

I am attempting to import a file that has the structure below (dump of tweets, with unicode strings). The goal is to convert this to a DataFrame using the pandas module. I assume the first step is to load to a json object and then convert to a DataFrame (per p. 166 of McKinney's Python for Data Analysis book) but am unsure and could use some pointers to manage this.

import sys, tailer
tweet_sample = tailer.head(open(r'\usTweets0.json'), 3)
tweet_sample # returns
['{u\'contributors\': None, u\'truncated\': False, u\'text\': u\'@KREAYSHAWN is...

Andy Hayden · Accepted Answer

Just use the DataFrame constructor...

In [6]: tweet_sample = [{'contributers': None, 'truncated': False, 'text': 'foo'}, {'contributers': None, 'truncated': True, 'text': 'bar'}]

In [7]: df = pd.DataFrame(tweet_sample)

In [8]: df
Out[8]:
  contributers text truncated
0         None  foo     False
1         None  bar      True

If you have the file as a JSON you can open it using json.load:

import json
with open('\usTweets0.json', 'r') as f:
    tweet_sample = json.load(f)

There will be a from_json coming soon to pandas...

Python: Import Tweet unicode data to pandas data frame object

Answers (1)

Related Questions