Chen Vilinsky
Chen Vilinsky

Reputation: 79

Opening a JSON format file inside a .txt file

I am assigned to read multiple .txt files that are actually JSON files from Twitter but I get an error trying to load the files using the JSON package.

    with open(files_path+'/tweets.json.2019-01-15.txt') as f:
    string=f.read()
    data=json.loads(string)
tweet_df=pd.DataFrame(data)
print(tweet_df)

The error I get is:

 File "C:\ProgramData\Anaconda3\envs\HW1\lib\json\decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 9762)

I tried opening other files, but the result was the same, the error was on the first column of the second line.

{"created_at":"Mon Jan 14 21:59:12 +0000 2019","id":1084932973353467904,"id_str":"1084932973353467904","text":...,"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"iw"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"oren_haz","name":"\u05d0\u05d5\u05e8\u05df \u05d7\u05d6\u05df","id":3185038236,"id_str":"3185038236","indices":[3,12]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"iw","timestamp_ms":"1547503152584"}
{"created_at":"Mon Jan 14 21:59:34 +0000 2019","id":1084933066898968576,"id_str":"1084933066898968576","text":"...,"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"iw"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"dudiamsalem","name":"\u05d3\u05d5\u05d3\u05d9 \u05d0\u05de\u05e1\u05dc\u05dd\u2066\ud83c\uddee\ud83c\uddf1\u2069\u2066","id":3221813461,"id_str":"3221813461","indices":[3,15]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"iw","timestamp_ms":"1547503174887"}

Thank you for your help.

Upvotes: 0

Views: 600

Answers (2)

tdelaney
tdelaney

Reputation: 77407

It appears that the text file has one JSON string per line and that each line should be a row in your dataframe. You can build the df by

with open(files_path+'/tweets.json.2019-01-15.txt') as f:
    tweet_df = pd.DataFrame([json.loads(line) for line in f])
print(tweet_df)

Upvotes: 1

Tim Roberts
Tim Roberts

Reputation: 54812

That's not a single JSON document. It is a series of separate JSON documents. Instead of using string=f.read(), you need to use a loop for each line separately, like:

    for line in f:
        data = json.loads(line)

Upvotes: 2

Related Questions