Reputation: 11
I have a dataframe with a column that looks like this:
d = {'genres':
[ [
{"id": 10751,"name": "Family"},
{"id": 16, "name": "Animation"},
{"id": 12, "name": "Adventure"},
{"id": 35, "name": "Comedy"}],
[
{"id": 878, "name": "Science Fiction"},
{"id": 12, "name": "Adventure"},
{"id": 53, "name": "Thriller"}]]}
df_input = pd.DataFrame(data=d)
I need the following output:
d = {'genres':
[ ["Family", "Animation", "Adventure", "Comedy",],
["Science Fiction", "Adventure", "Thriller"]]}
df_output = pd.DataFrame(data=d)
Upvotes: 0
Views: 58
Reputation: 862581
You can extract values from dictionary by list comprehension in Series.apply
:
df_input['genres'] = df_input['genres'].apply(lambda x:[y['name'] for y in x])
print (df_input)
genres
0 [Family, Animation, Adventure, Comedy]
1 [Science Fiction, Adventure, Thriller]
Or by nested list comprehension:
df_input['genres'] = [[y['name'] for y in x] for x in df_input['genres']]
EDIT: If real data contsins stings, not dicts use:
import json, ast
df_input['genres'] = df_input['genres'].apply(lambda x:[y['name'] for y in ast.literal_eval(x)])
Or:
df_input['genres'] = df_input['genres'].apply(lambda x:[y['name'] for y in json.loads(x)])
Upvotes: 2
Reputation: 221
If you want to do this with pandas, it is possible with apply method Try creating a function to return "name" values for each element,
>>> def getNames(x):
return [xi["name"] for xi in x]
Now, all you need to do is apply it on a column in your dataframe,
>>> df = pd.DataFrame(data=d)
>>> d_out = df['genres'].apply(getNames) # This returns the output that you want
>>> df_output = pd.DataFrame(data=d_out, columns=["genres"])
genres
0 [Family, Animation, Adventure, Comedy]
1 [Science Fiction, Adventure, Thriller]
There could be shorter ways.
Upvotes: 0