Aayush Panda
Aayush Panda

Reputation: 554

Not able to apply function properly to DataFrame column

I am working with a DataFrame that looks like this:

DataFrame

I wanted to create a new column 'Named' in order to use the categorical column 'Name' in linear regression. I did the following to accomplish that goal:

def named(name):
if name == 'UNNAMED':
    return 0
else:
    return 1


df['Named'] = df['Name'].apply(lambda name: named(name))

However, that gives a column that consists only of the value 1

The function works on its own, but for some reason doesn't behave when used in the DataFrame.apply method.

Upvotes: 0

Views: 524

Answers (2)

Quickbeam2k1
Quickbeam2k1

Reputation: 5437

  1. You can solve this elegantly with
df.assign(Named = lambda df: (df["Name"]!='UNNAMED').astype(int))
  1. Your function is not vectorized, but apply passes the whole column, i.e. a series object to named. This object clearly is not equal to UNNAMED, hence, you get the 1. Did you try applymap? This works for me as you desire

Moreover, on a recent pandas version, I can't reproduce your example, I'm seeing this error message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 1

IoaTzimas
IoaTzimas

Reputation: 10624

The following should work:

df['Named']=[i for i in map(lambda x: 0 if x.strip()=='UNNAMED' else 1, df['Name'])]

Upvotes: 1

Related Questions