Reputation: 777
I have a perfectly normal pandas dataframe which I create after loading this dataset: https://www.kaggle.com/tmdb/tmdb-movie-metadata/data
As you can see, the genres column contains a nested structure which appears to be a list of dictionaries, or json objects depending on how you see it? The keys of these dictionaries are 'id' and 'name'.
Anyways, I have tried everything including transforming the column into a json with tojson(), or using pandas json_normalize() method without any luck.
If I use json_normalize() I get an AttributeError: 'str' object has no attribute 'itervalues':
pd.io.json.json_normalize(obj_movies['genres'], meta = ['id','name'])
In reality, my goal would be to parse this list to create a set of unique genres names for each row...
Upvotes: 1
Views: 1317
Reputation: 863166
Use:
import ast
obj_movies = pd.read_csv('tmdb_5000_movies.csv')
obj_movies['uniq'] = [list(set([y['name'] for y in x])) for x in obj_movies['genres'].apply(ast.literal_eval)]
print (obj_movies[['uniq'] ].head(10))
uniq
0 [Fantasy, Science Fiction, Adventure, Action]
1 [Fantasy, Adventure, Action]
2 [Crime, Adventure, Action]
3 [Drama, Crime, Thriller, Action]
4 [Science Fiction, Adventure, Action]
5 [Fantasy, Adventure, Action]
6 [Family, Animation]
7 [Science Fiction, Adventure, Action]
8 [Fantasy, Family, Adventure]
9 [Fantasy, Adventure, Action]
Upvotes: 1