Thelonious Monk
Thelonious Monk

Reputation: 466

How to load Twitter json object into python

I want to load a json mined from twitter api into python. Attached is sample of json object:

{"created_at":"Mon Apr 22 18:17:09 +0000 2019","id":1120391103813910529,"id_str":"1120391103813910529","text":"On peut dire que la base de cette 8e saison est en place \ud83d\ude4c #GOTS8E2","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":243071138,"id_str":"243071138","name":"Mr B","screen_name":"skeyos","location":"Namur","url":null,"description":null,"translator_type":"none","protected":false,"verified":false,"followers_count":197,"friends_count":1811,"listed_count":6,"favourites_count":7826,"statuses_count":8044,"created_at":"Wed Jan 26 06:49:05 +0000 2011","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang":"fr","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/493833348167770112\/aGLGemZ5_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/493833348167770112\/aGLGemZ5_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/243071138\/1406574068","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"GOTS8E2","indices":[59,67]}],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"fr","timestamp_ms":"1555957029666"}

{"created_at":"Mon Apr 22 18:17:14 +0000 2019","id":1120391124722565123,"id_str":"1120391124722565123","text":"...

I am trying the following code:

with open('tweets.json') as tweet_data:
    json_data = json.load(tweet_data)

But get the following error:

JSONDecodeError: Extra data: line 3 column 1 (char 2149)

Unfortunately it is not possible for me to edit the json object too much, as it is really big. I need to figure out how to read this into Python. Any help would be greatly appreciated!

Edit: It works with the following code:

dat=list()
with open ('data_tweets_E2.json', 'r') as f:
    for l in f.readlines():
        if not l.strip (): # skip empty lines
            continue

        json_data = json.loads (l)
        dat.append(json_data)

Upvotes: 3

Views: 4240

Answers (3)

mmm
mmm

Reputation: 1436

Here is the code.You need to install Pandas first of course. If the solution helped you please mark this answer with the green check.

import json
import pandas as pd

with open('tweets.json') as json_file:
    data_list = json.load(json_file)

tweet_data_frame = pd.DataFrame.from_dict(data_list)
print(tweet_data_frame)
print(data_list)

So as you can see print(data_list) prints out a list and print(tweet_data_frame) prints out dataframe.

If you want to see the types of these variables just use type() print(type(data_list))

Important: What I tried to tell you is that your JSON file has bad format and a lot of mistakes. If you have more JSON objects they need to be in array [{"example":"value"},{"example":"value"}] . Your JSON file has errors. Try it with different JSON file.

Upvotes: 2

donkopotamus
donkopotamus

Reputation: 23176

Each line contains a separate json object, parse and store them into a list:

with open('tweets.json', 'r') as tweet_data:
    values = [json.loads(line) for line in tweet_data.readlines() 
              if not line.strip()]

Upvotes: 1

hingev
hingev

Reputation: 264

Every line contains a new object, so try parsing them line by line.

import json

with open ('tweets.json', 'r') as f:
    for l in f.readlines():
        if not l.strip (): # skip empty lines
            continue

        json_data = json.loads (l)
        print (json_data)

Upvotes: 1

Related Questions