Reputation: 466
I want to load a json mined from twitter api into python. Attached is sample of json object:
{"created_at":"Mon Apr 22 18:17:09 +0000 2019","id":1120391103813910529,"id_str":"1120391103813910529","text":"On peut dire que la base de cette 8e saison est en place \ud83d\ude4c #GOTS8E2","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":243071138,"id_str":"243071138","name":"Mr B","screen_name":"skeyos","location":"Namur","url":null,"description":null,"translator_type":"none","protected":false,"verified":false,"followers_count":197,"friends_count":1811,"listed_count":6,"favourites_count":7826,"statuses_count":8044,"created_at":"Wed Jan 26 06:49:05 +0000 2011","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang":"fr","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/493833348167770112\/aGLGemZ5_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/493833348167770112\/aGLGemZ5_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/243071138\/1406574068","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"GOTS8E2","indices":[59,67]}],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"fr","timestamp_ms":"1555957029666"}
{"created_at":"Mon Apr 22 18:17:14 +0000 2019","id":1120391124722565123,"id_str":"1120391124722565123","text":"...
I am trying the following code:
with open('tweets.json') as tweet_data:
json_data = json.load(tweet_data)
But get the following error:
JSONDecodeError: Extra data: line 3 column 1 (char 2149)
Unfortunately it is not possible for me to edit the json object too much, as it is really big. I need to figure out how to read this into Python. Any help would be greatly appreciated!
Edit: It works with the following code:
dat=list()
with open ('data_tweets_E2.json', 'r') as f:
for l in f.readlines():
if not l.strip (): # skip empty lines
continue
json_data = json.loads (l)
dat.append(json_data)
Upvotes: 3
Views: 4240
Reputation: 1436
Here is the code.You need to install Pandas first of course. If the solution helped you please mark this answer with the green check.
import json
import pandas as pd
with open('tweets.json') as json_file:
data_list = json.load(json_file)
tweet_data_frame = pd.DataFrame.from_dict(data_list)
print(tweet_data_frame)
print(data_list)
So as you can see print(data_list)
prints out a list and print(tweet_data_frame)
prints out dataframe.
If you want to see the types of these variables just use type() print(type(data_list))
Important: What I tried to tell you is that your JSON file has bad format and a lot of mistakes. If you have more JSON objects they need to be in array [{"example":"value"},{"example":"value"}]
. Your JSON file has errors. Try it with different JSON file.
Upvotes: 2
Reputation: 23176
Each line contains a separate json object, parse and store them into a list:
with open('tweets.json', 'r') as tweet_data:
values = [json.loads(line) for line in tweet_data.readlines()
if not line.strip()]
Upvotes: 1
Reputation: 264
Every line contains a new object, so try parsing them line by line.
import json
with open ('tweets.json', 'r') as f:
for l in f.readlines():
if not l.strip (): # skip empty lines
continue
json_data = json.loads (l)
print (json_data)
Upvotes: 1