Kbbm
Kbbm

Reputation: 375

'int' object is not subscriptable after if statement

So I have a dataframe:

import pandas as pd

df = pd.DataFrame({'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'score': [1, 3, 4, 5, 2]})

And I want to create a new column based on the conditions in the 'score' column.

I tried it out like this

df['happiness'] = df['score']
def are_you_ok(df):
    if df['happiness'] >= 4:
        return 'happy',
    elif df['happiness'] <= 2:
        return 'sad',
    else:
        return 'ok'

df['happines'] = df['happiness'].apply(are_you_ok)
df

When I try to run that though, all I get is:

TypeError: 'int' object is not subscriptable

Can I not use this kind of function with an integer?

Upvotes: 2

Views: 352

Answers (3)

BENY
BENY

Reputation: 323306

Using pd.cut

pd.cut(df.score,[0,2,4,np.Inf],labels=['sad','ok','happy'])
Out[594]: 
0      sad
1       ok
2       ok
3    happy
4      sad

#df['yourcol']=pd.cut(df.score,[0,2,4,np.Inf],labels=['sad','ok','happy'])

Upvotes: 1

timgeb
timgeb

Reputation: 78700

The problem is that apply applies your function to every single value in the column. df is not a DataFrame inside of are_you_ok, but (in your case) an integer. Naturally, Python is complaining that you cannot index into integers with ['happiness'].

Your code is quite easy to fix, though. Just rewrite are_you_ok such that it works with integer arguments.

In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
   ...:         'score': [1, 3, 4, 5, 2]})
   ...:         
In [3]: def are_you_ok(x):
   ...:     if x >= 4:
   ...:         return 'happy'
   ...:     elif x <= 2:
   ...:         return 'sad'
   ...:     else:
   ...:         return 'ok'
   ...:     
In [4]: df['happiness'] = df['score'].apply(are_you_ok)
In [5]: df
Out[5]: 
    name  score happiness
0  Jason      1       sad
1  Molly      3        ok
2   Tina      4     happy
3   Jake      5     happy
4    Amy      2       sad

Upvotes: 0

sacuL
sacuL

Reputation: 51345

Sounds like you want np.select from numpy

import numpy as np

conds = [df.score >=4, df.score <=2]

choices = ['happy', 'sad']

df['happiness'] = np.select(conds, choices, default='ok')

>>> df
    name  score happiness
0  Jason      1       sad
1  Molly      3        ok
2   Tina      4     happy
3   Jake      5     happy
4    Amy      2       sad

Note: you can avoid explicitly importing numpy by using pandas.np (or pd.np, depending how you imported pandas) instead of just np

Upvotes: 2

Related Questions