TechnicalTim
TechnicalTim

Reputation: 155

TypeError: apply() missing 1 required positional argument: 'func'

When I try to create a new column with a function that is based on values in another column, I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-66-491e832a7dac> in <module>()
  4     return 'Other'
  5 
----> 6 df['PriceCatColumn'] = df.apply(PriceCat, axis=1)

TypeError: apply() missing 1 required positional argument: 'func'

This is the code:

def PriceCat (row):
if row['Median ASP'] <= 50:
    return 'Category 1'
return 'Other'

df['PriceCatColumn'] = df.apply(PriceCat, axis=1)

What am I doing wrong, exactly? I researched solutions to this issue, but that didn't seem to lead to the answers I needed.

Upvotes: 2

Views: 12145

Answers (2)

johng
johng

Reputation: 21

PriceCat should take in a value, not a dataframe.

def PriceCat(x):
    if x <= 50:
        return 'Category 1'
    else:
        return 'Other'

df['PriceCatColumn'] = df['Median ASP'].apply(PriceCat)

         X Median ASP PriceCatColumn
     0    1   10  Category 1
     1    2   20  Category 1
     2    3   30  Category 1
     3    4   40  Category 1
     4    5   50  Category 1
     5    6   60  Other
     6    7   70  Other
     7    8   80  Other
     8    9   90  Other
     9    10  100 Other

Upvotes: 1

sacuL
sacuL

Reputation: 51335

Alternatives:

Use np.where instead if there are only 2 possible categories.

Example:

>>> df
   Median ASP
0           1
1           2
2          51
3          52
4           5

df['PriceCatColumn'] = np.where(df['Median ASP'] <= 50, 'Category 1', 'Other')

>>> df
   Median ASP PriceCatColumn
0           1     Category 1
1           2     Category 1
2          51          Other
3          52          Other
4           5     Category 1

If there are more categories, use np.select. For instance:

conds = [df['Median ASP'] <=3, df['Median ASP'] <=50]

choices = ['Category 1', 'Category 2']

df['PriceCatColumn'] = np.select(conds, choices, default='Other')

>>> df
   Median ASP PriceCatColumn
0           1     Category 1
1           2     Category 1
2          51          Other
3          52          Other
4           5     Category 2

Your code:

That being said, your code does seem to work, albeit not quite as efficiently as it works with np methods:

def PriceCat (row):
    if row['Median ASP'] <= 50:
        return 'Category 1'
    return 'Other'

df['PriceCatColumn'] = df.apply(PriceCat, axis=1)

>>> df
   Median ASP PriceCatColumn
0           1     Category 1
1           2     Category 1
2          51          Other
3          52          Other
4           5     Category 1

Upvotes: 3

Related Questions