William Falcon
William Falcon

Reputation: 9823

Pandas tweet json weird characters u'�'

Not sure why but when I load tweets in a json file to pandas i get a lot of weird characters.

 for file_name in files:
        if '.json' in file_name:
            file_path = WORKING_DIR + '/data/' + file_name

            # Reading the json as a dict
            with open(file_path) as json_d:
                data = json.load(json_d, encoding='utf8')
                json_df = pd.DataFrame.from_dict(data)
                dfs.append(json_df)

Upvotes: 0

Views: 66

Answers (1)

Shubham R
Shubham R

Reputation: 7644

Try using encoding='utf-16' or encoding='utf-8'

for file_name in files: if '.json' in file_name: file_path = WORKING_DIR + '/data/' + file_name

        # Reading the json as a dict
        with open(file_path) as json_d:
            data = json.load(json_d, encoding='utf-16')
            json_df = pd.DataFrame.from_dict(data)
            dfs.append(json_df)

As @MYGz Suggested "u'�' means it failed to decode the character with 'utf-8'" So try using other encoding.

Upvotes: 1

Related Questions