Reputation: 9823
Not sure why but when I load tweets in a json file to pandas i get a lot of weird characters.
for file_name in files:
if '.json' in file_name:
file_path = WORKING_DIR + '/data/' + file_name
# Reading the json as a dict
with open(file_path) as json_d:
data = json.load(json_d, encoding='utf8')
json_df = pd.DataFrame.from_dict(data)
dfs.append(json_df)
Upvotes: 0
Views: 66
Reputation: 7644
Try using encoding='utf-16'
or encoding='utf-8'
for file_name in files: if '.json' in file_name: file_path = WORKING_DIR + '/data/' + file_name
# Reading the json as a dict
with open(file_path) as json_d:
data = json.load(json_d, encoding='utf-16')
json_df = pd.DataFrame.from_dict(data)
dfs.append(json_df)
As @MYGz Suggested "u'�' means it failed to decode the character with 'utf-8'" So try using other encoding.
Upvotes: 1