Reputation: 13
I'm trying to read my Twitter data saved in json format using the following code:
import json
with open(file, 'r') as f:
line = f.readline()
tweet = json.loads(line)
df1 = pd.DataFrame(tweet)
This code reads only one tweet and it works, but when I'm trying to read all file by:
with open(file, 'r') as f:
for line in f:
tweet = json.loads(line)
I receive an error:
JSONDecodeError: Expecting value: line 2 column 1 (char 1)
What should I change to read this file properly?
My main task is to find creation dates for those tweets and I found it using following filters (I just used one tweet which worked at the beginning):
df2 = df[["user"]]
df3 = df2.loc[['created_at']]
df3
Is there a better way than DataFrames to handle this data?
Upvotes: 0
Views: 3795
Reputation: 5914
A more succint way to read in (all) your JSON file for me looks like
import pandas as pd
df = pd.read_json("python.json", orient = 'records', lines = True)
You can then apply transformations to df
so to get data from the columns that you are interested in.
Upvotes: 2
Reputation: 1580
You can do something like this:
import pandas as pd
#results is the JSON tweet data.
#Define the columns you want to extract
resultFrame = pd.DataFrame(columns=["username","created_at","tweet"])
print len(results)
for i in range(len(results)):
resultFrame.loc[i,"username"] = results[i].user.screen_name
resultFrame.loc[i, "created_at"] = results[i].created_at
resultFrame.loc[i, "tweet"] = results[i].text
print resultFrame.head()
Upvotes: 1