Ahmad Anis
Ahmad Anis

Reputation: 2704

TypeError: string indices must be integers in Pandas

I have a dataFrame with 1 column of 'JSON' or dictionary type data in strings. To convert it into Python Dictionary, I came up with following code from Stackoverflow

class Iden():
  def __getitem__(name, index):
    return index

df['genres'].map(lambda a: eval(str(a), {}, Iden()))

which converts it into dictionary, which I confirmed using following code,

df['genres'].map(lambda a: eval(str(a), {}, Iden())).map(lambda a: type(a[0]))

and my output was

0       <class 'dict'>
1       <class 'dict'>
2       <class 'dict'>
3       <class 'dict'>
4       <class 'dict'>
             ...  

Now, single variable of genres is as follows

"[{'id': 35, 'name': 'Comedy'}]"

I want to extract the name from this and my code is

df['genres'].map(lambda a: eval(str(a), {}, Iden())).map(lambda a: a[0]['name'])

but it is giving error

--------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-109-3da37181d7d9> in <module>
----> 1 df['genres'].map(lambda a: eval(str(a), {}, Iden())).map(lambda a: a[0]['name'])

C:\Anaconda\envs\myenv\lib\site-packages\pandas\core\series.py in map(self, arg, na_action)
   3628         dtype: object
   3629         """
-> 3630         new_values = super()._map_values(arg, na_action=na_action)
   3631         return self._constructor(new_values, index=self.index).__finalize__(self)
   3632 

C:\Anaconda\envs\myenv\lib\site-packages\pandas\core\base.py in _map_values(self, mapper, na_action)
   1143 
   1144         # mapper is a function
-> 1145         new_values = map_f(values, mapper)
   1146 
   1147         return new_values

pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-109-3da37181d7d9> in <lambda>(a)
----> 1 df['genres'].map(lambda a: eval(str(a), {}, Iden())).map(lambda a: a[0]['name'])

TypeError: string indices must be integers

First 5 rows of genres is

0                       [{'id': 35, 'name': 'Comedy'}]
1    [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...
2                        [{'id': 18, 'name': 'Drama'}]
3    [{'id': 53, 'name': 'Thriller'}, {'id': 18, 'n...
4    [{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...

What can be potential reason and potential fix for this error?

Upvotes: 1

Views: 3290

Answers (2)

jezrael
jezrael

Reputation: 862611

Use ast.literal_eval for convert jsons to dictionaries and then select first list by indexing with str[0] and first name by Series.str.get, it return missing values if no first list or no name key in dictionary:

import ast
    
df['genres'].map(ast.literal_eval).str[0].str.get('name')

Upvotes: 0

Be Chiller Too
Be Chiller Too

Reputation: 2910

You can simply use JSON:

import json
s = "[{'id': 35, 'name': 'Comedy'}]"
s = s.replace("'", '"')
l = json.loads(s)

l[0]["name"] # -> 'Comedy'

So you would do something like:

df["genres"].apply(lambda s: json.loads(s.replace("'", '"'))[0]["name"])

Upvotes: 1

Related Questions