Reputation: 2704
I have a dataFrame with 1 column of 'JSON' or dictionary type data in strings. To convert it into Python Dictionary, I came up with following code from Stackoverflow
class Iden():
def __getitem__(name, index):
return index
df['genres'].map(lambda a: eval(str(a), {}, Iden()))
which converts it into dictionary, which I confirmed using following code,
df['genres'].map(lambda a: eval(str(a), {}, Iden())).map(lambda a: type(a[0]))
and my output was
0 <class 'dict'>
1 <class 'dict'>
2 <class 'dict'>
3 <class 'dict'>
4 <class 'dict'>
...
Now, single variable of genres
is as follows
"[{'id': 35, 'name': 'Comedy'}]"
I want to extract the name
from this and my code is
df['genres'].map(lambda a: eval(str(a), {}, Iden())).map(lambda a: a[0]['name'])
but it is giving error
--------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-109-3da37181d7d9> in <module>
----> 1 df['genres'].map(lambda a: eval(str(a), {}, Iden())).map(lambda a: a[0]['name'])
C:\Anaconda\envs\myenv\lib\site-packages\pandas\core\series.py in map(self, arg, na_action)
3628 dtype: object
3629 """
-> 3630 new_values = super()._map_values(arg, na_action=na_action)
3631 return self._constructor(new_values, index=self.index).__finalize__(self)
3632
C:\Anaconda\envs\myenv\lib\site-packages\pandas\core\base.py in _map_values(self, mapper, na_action)
1143
1144 # mapper is a function
-> 1145 new_values = map_f(values, mapper)
1146
1147 return new_values
pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-109-3da37181d7d9> in <lambda>(a)
----> 1 df['genres'].map(lambda a: eval(str(a), {}, Iden())).map(lambda a: a[0]['name'])
TypeError: string indices must be integers
First 5 rows of genres
is
0 [{'id': 35, 'name': 'Comedy'}]
1 [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...
2 [{'id': 18, 'name': 'Drama'}]
3 [{'id': 53, 'name': 'Thriller'}, {'id': 18, 'n...
4 [{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...
What can be potential reason and potential fix for this error?
Upvotes: 1
Views: 3290
Reputation: 862611
Use ast.literal_eval
for convert jsons to dictionaries and then select first list by indexing with str[0]
and first name
by Series.str.get
, it return missing values if no first list or no name
key in dictionary:
import ast
df['genres'].map(ast.literal_eval).str[0].str.get('name')
Upvotes: 0
Reputation: 2910
You can simply use JSON:
import json
s = "[{'id': 35, 'name': 'Comedy'}]"
s = s.replace("'", '"')
l = json.loads(s)
l[0]["name"] # -> 'Comedy'
So you would do something like:
df["genres"].apply(lambda s: json.loads(s.replace("'", '"'))[0]["name"])
Upvotes: 1