Reputation: 23
I'm using python (pandas) to read a JSON file with raw tweets but i'm getting the following error:
ValueError: Unexpected character found when decoding array value (2)
I would appreciate any help.
EDIT: HERE IS A SAMPLE OF THE JSON
{"created_at":"Sat Nov 16 14:15:52 +0000 2019","id":1195707056365461505,"id_str":"1195707056365461505","text":"Any arsenal red members on here, dm me please...got a couple questions\ud83d\ude05\ud83e\udd14","source":"\u003ca href=\"http://twitter.com/download/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":974846850,"id_str":"974846850","name":"Rico Rodrigo","screen_name":"DatGuyTy_online","location":"Brum","url":null,"description":"Aspiring Accountant x Arsenal enthusiast x Anime addict","translator_type":"none","protected":false,"verified":false,"followers_count":647,"friends_count":901,"listed_count":9,"favourites_count":24989,"statuses_count":24628,"created_at":"Tue Nov 27 22:25:31 +0000 2012","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang":null,"contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/1071377159682514945/Np4nGX5m_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/1071377159682514945/Np4nGX5m_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/974846850/1554183093","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1573913752057"}
This is the code i'm using to read the file:
import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt
import json
import os
tweet_file = 'raw_data.json'
tweets = pd.read_json(tweet_file, convert_dates=True, lines=True, encoding='utf-8')
Upvotes: 1
Views: 1105
Reputation: 9512
I had this error with my own json file, trying it with pandas:
File ~/.local/lib/python3.9/site-packages/pandas/io/json/_json.py:1133, in FrameParser._parse_no_numpy(self)
1129 orient = self.orient
1131 if orient == "columns":
1132 self.obj = DataFrame(
-> 1133 loads(json, precise_float=self.precise_float), dtype=None
1134 )
1135 elif orient == "split":
1136 decoded = {
1137 str(k): v
1138 for k, v in loads(json, precise_float=self.precise_float).items()
1139 }
ValueError: Unexpected character found when decoding array value (1)
I then opened the file in VSCode as a json and checked line 2 column 914 and found that after that column, there was a tab instead of spaces.
To fix this, I regex replaced all tabs with four spaces:
Side remark: I had a json with many hardcoded \n
linebreaks and thought that I would have to drop them as well, but these hardcoded \n
do not harm, you can keep them.
You may find other red markers in the JSON view of VSCode or some other JSON editor. I also ran into a "JSONDecodeError" error, see both fixes in one go at How can I fix the error "JSONDecodeError: Expecting value: ..." when loading a json file with json.load()?.
Upvotes: 0