Reputation: 319
I am trying to extract the text from a .json file which i extracted. The problem is, that every time I am trying I am getting the aforementioned error(title). Here is my code:
import json
with open('grtwe.json.json', 'r') as f:
line = f.readline()
tweet = json.loads(line)
print(json.dumps(tweet, indent=4))
Also tweets are greek.
The first line of my .json file is this
{"place": null, "geo": null, "source": "<a href=\"" rel=\"nofollow\">Twitter Lite</a>", "id_str": "967369573505921024", "favorite_count": 0, "in_reply_to_status_id": null, "favorited": false, "in_reply_to_user_id": null, "in_reply_to_status_id_str": null, "contributors": null, "is_quote_status": false, "full_text": "RT @documentonews:#Novartis_gate\n\u0391\u03c0\u03bf\u03ba\u03ac\u03bb\u03c5\u03c8\u03b7-\u03c3\u03bf\u03ba: \u039a\u03b1\u03b9 \u03c4\u03c1\u03af\u03c4\u03bf\u03c2 \u03bd\u03b5\u03ba\u03c1\u03cc\u03c2 \u03c3\u03c4\u03bf \u03b4\u03c1\u03cc\u03bc\u03bf \u03c4\u03b7\u03c2 Novartis, \u03c3\u03c4\u03bf Documento \u03c0\u03bf\u03c5 \u03ba\u03c5\u03ba\u03bb\u03bf\u03c6\u03bf\u03c1\u03b5\u03af \u03c4\u03b7\u03bd \u039a\u03c5\u03c1\u03b9\u03b1\u03ba\u03ae | https\u2026", "truncated": false, "user": {"notifications": false, "is_translator": false, "profile_image_url": "", "profile_background_tile": false, "id_str": "387685829", "geo_enabled": false, "profile_image_url_https":"", "statuses_count": 47093, "screen_name": "satrapis21", "is_translation_enabled": false, "followers_count": 1692, "has_extended_profile": false, "profile_background_image_url_https": "", "url": null, "follow_request_sent": false, "profile_sidebar_border_color": "FFFFFF", "profile_use_background_image": true, "profile_link_color": "D02B55", "profile_text_color": "3E4415", "description":"\u03be\u03b5\u03bd\u03bf\u03b4\u03bf\u03c7\u03bf\u03c2 \u03b3\u03ba\u03bf\u03c5\u03bb\u03b1\u03b3\u03ba \u03b5\u03c0\u03b5\u03bd\u03b4\u03c5\u03c4\u03b7\u03c2", "profile_background_color": "352726", "id": 387685829, "friends_count": 1689, "favourites_count": 3380, "created_at": "Sun Oct 09 14:01:48 +0000 2011", "default_profile": false, "translator_type": "none", "entities": {"description": {"urls": []}}, "profile_sidebar_fill_color": "99CC33", "default_profile_image": false, "listed_count": 39, "profile_banner_url": "","following": false, "utc_offset": 7200, "protected": false, "verified": false, "name": "\u03ba\u03bf\u03c5\u03bb\u03b7\u03c2satrapis", "profile_background_image_url":"", "time_zone": "Vilnius", "lang": "el", "contributors_enabled": false,"location": ""}, "metadata": {"result_type": "recent", "iso_language_code": "el"}, "id": 967369573505921024, "in_reply_to_screen_name": null, "created_at": "Sat Feb 2412:04:13 +0000 2018", "display_text_range": [0, 140], "retweeted": false, "in_reply_to_user_id_str": null, "lang": "el", "coordinates": null, "retweeted_status": {"place": null, "geo": null, "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>", "id_str": "967369433864863744", "favorite_count": 13, "in_reply_to_status_id": null, "favorited": false, "in_reply_to_user_id": null, "in_reply_to_status_id_str": null, "contributors": null, "is_quote_status": false,"full_text": "#Novartis_gate\n\u0391\u03c0\u03bf\u03ba\u03ac\u03bb\u03c5\u03c8\u03b7-\u03c3\u03bf\u03ba: \u039a\u03b1\u03b9 \u03c4\u03c1\u03af\u03c4\u03bf\u03c2 \u03bd\u03b5\u03ba\u03c1\u03cc\u03c2 \u03c3\u03c4\u03bf \u03b4\u03c1\u03cc\u03bc\u03bf \u03c4\u03b7\u03c2 Novartis, \u03c3\u03c4\u03bf Documento \u03c0\u03bf\u03c5 \u03ba\u03c5\u03ba\u03bb\u03bf\u03c6\u03bf\u03c1\u03b5\u03af \u03c4\u03b7\u03bd \u039a\u03c5\u03c1\u03b9\u03b1\u03ba\u03ae | ","truncated": false, "user": {"notifications": false, "is_translator": false, "profile_image_url": "", "profile_background_tile": false, "id_str": "795738344906952705", "geo_enabled": false, "profile_image_url_https": "", "statuses_count": 39383, "screen_name": "documentonews", "is_translation_enabled": false, "followers_count": 4607, "has_extended_profile": false, "profile_background_image_url_https": null, "url": "", "follow_request_sent": false, "profile_sidebar_border_color": "C0DEED", "profile_use_background_image": true, "profile_link_color": "1DA1F2", "profile_text_color": "333333", "description": "H \u039d\u03ad\u03b1 \u039c\u03b5\u03b3\u03ac\u03bb\u03b7 \u039a\u03c5\u03c1\u03b9\u03b1\u03ba\u03ac\u03c4\u03b9\u03ba\u03b7 \u0395\u03c6\u03b7\u03bc\u03b5\u03c1\u03af\u03b4\u03b1", "profile_background_color": "F5F8FA", "id": 795738344906952705, "friends_count": 180, "favourites_count": 0, "created_at": "Mon Nov 07 21:23:00 +0000 2016", "default_profile": true, "translator_type": "none", "entities": {"url": {"urls": [{"url": "", "display_url": "documentonews.gr", "expanded_url": "", "indices": [0, 23]}]}, "description": {"urls": []}}, "profile_sidebar_fill_color": "DDEEF6", "default_profile_image": false, "listed_count": 69, "profile_banner_url": "", "following": false,"utc_offset": 7200, "protected": false, "verified": false, "name": "Documento", "profile_background_image_url": null, "time_zone": "Athens", "lang": "en", "contributors_enabled": false, "location": "Greece"},"metadata": {"result_type": "recent", "iso_language_code": "el"}, "id": 967369433864863744, "in_reply_to_screen_name": null, "created_at": "Sat Feb 24 12:03:40 +0000 2018", "display_text_range": [0, 162],"retweeted": false, "in_reply_to_user_id_str": null, "lang": "el", "coordinates": null, "entities": {"hashtags": [{"text": "Novartis_gate", "indices": [0, 14]}], "user_mentions": [], "symbols": [], "urls": [{"url": "", "display_url": "Documentonews.gr", "expanded_url": "", "indices": [115, 138]}, {"url": "", "display_url":"documentonews.gr/article/apokal\u2026", "expanded_url": "", "indices": [139, 162]}]}, "possibly_sensitive": false, "retweet_count": 10}, "entities": {"hashtags": [{"text": "Novartis_gate", "indices": [19, 33]}], "user_mentions": [{"name": "Documento", "id": 795738344906952705,"screen_name": "documentonews", "id_str": "795738344906952705", "indices": [3, 17]}], "symbols": [], "urls": []}, "possibly_sensitive": false, "retweet_count": 10}
The rest of the file contains such records.
Upvotes: 0
Views: 8265
Reputation: 1
loads() triggered the error!
and so the problem is in the json file which appears to be not in the json
format.
you need to get sure that the json file is in the correct json format
you could go to https://jsonlint.com and past your json string there
and it will tell you if it is in the right format.
Upvotes: 0
Reputation: 20414
This is most likely because you are trying to only parse the first line of the file (since you call json.loads()
on f.readline()
). It sounds more probable, that your whole file is JSON - in which case you want to pass the whole thing in one go.
with open('grtwe.json.json', 'r') as f:
tweet = json.loads(f.read())
print(json.dumps(tweet, indent=4))
However, I obviously can't check without the file!
Upvotes: 1