Jacob
Jacob

Reputation: 13

How can I load into a pandas DataFrame tweets from a json file?

I'm trying to read my Twitter data saved in json format using the following code:

import json

with open(file, 'r') as f:
    line = f.readline()
    tweet = json.loads(line)
    df1 = pd.DataFrame(tweet)

This code reads only one tweet and it works, but when I'm trying to read all file by:

with open(file, 'r') as f:
    for line in f:
        tweet = json.loads(line)

I receive an error:

JSONDecodeError: Expecting value: line 2 column 1 (char 1)

What should I change to read this file properly?

My main task is to find creation dates for those tweets and I found it using following filters (I just used one tweet which worked at the beginning):

df2 = df[["user"]]
df3 = df2.loc[['created_at']]
df3

Is there a better way than DataFrames to handle this data?

Upvotes: 0

Views: 3795

Answers (2)

Davide Fiocco
Davide Fiocco

Reputation: 5914

A more succint way to read in (all) your JSON file for me looks like

import pandas as pd
df = pd.read_json("python.json", orient = 'records', lines = True)

You can then apply transformations to df so to get data from the columns that you are interested in.

Upvotes: 2

Bhushan Pant
Bhushan Pant

Reputation: 1580

You can do something like this:

import pandas as pd
#results is the JSON tweet data. 

#Define the columns you want to extract
resultFrame = pd.DataFrame(columns=["username","created_at","tweet"])
print len(results)

for i in range(len(results)):
    resultFrame.loc[i,"username"] = results[i].user.screen_name
    resultFrame.loc[i, "created_at"] = results[i].created_at
    resultFrame.loc[i, "tweet"] = results[i].text

print resultFrame.head()

Upvotes: 1

Related Questions