Odisseo
Odisseo

Reputation: 777

Extract list of JSON objects in string form from Pandas Dataframe column

I have a perfectly normal pandas dataframe which I create after loading this dataset: https://www.kaggle.com/tmdb/tmdb-movie-metadata/data

As you can see, the genres column contains a nested structure which appears to be a list of dictionaries, or json objects depending on how you see it? The keys of these dictionaries are 'id' and 'name'.

Anyways, I have tried everything including transforming the column into a json with tojson(), or using pandas json_normalize() method without any luck.

If I use json_normalize() I get an AttributeError: 'str' object has no attribute 'itervalues':

pd.io.json.json_normalize(obj_movies['genres'], meta = ['id','name']) 

In reality, my goal would be to parse this list to create a set of unique genres names for each row...

Upvotes: 1

Views: 1317

Answers (1)

jezrael
jezrael

Reputation: 863166

Use:

import ast

obj_movies = pd.read_csv('tmdb_5000_movies.csv')
obj_movies['uniq'] = [list(set([y['name'] for y in x])) for x in obj_movies['genres'].apply(ast.literal_eval)]
print (obj_movies[['uniq'] ].head(10))
                                            uniq
0  [Fantasy, Science Fiction, Adventure, Action]
1                   [Fantasy, Adventure, Action]
2                     [Crime, Adventure, Action]
3               [Drama, Crime, Thriller, Action]
4           [Science Fiction, Adventure, Action]
5                   [Fantasy, Adventure, Action]
6                            [Family, Animation]
7           [Science Fiction, Adventure, Action]
8                   [Fantasy, Family, Adventure]
9                   [Fantasy, Adventure, Action]

Upvotes: 1

Related Questions