Reputation: 1465
I'm trying to label some values in a DataFrame in Pandas based on the value itself, in-place.
df = pd.read_csv('data/extrusion.csv')
# get list of columns that contain thickness
columns = [c for c in data.columns if 'SDickeIst'.lower() in c.lower()]
# create a function that returns the class based on value
def get_label(ser):
ser.map(lambda x : x if x == 0 else 1)
df[columns].apply(get_label)
I would expect that the apply function takes each column in particular and applies get_label
on it. In turn, get_label
gets the ser
argument as a Series and uses map to map each element != 0 with 1.
Upvotes: 1
Views: 156
Reputation: 5503
get_label
doesn't return anything.
You want to return ser.map(lambda x : x if x == 0 else 1)
.
def get_label(ser):
return ser.map(lambda x : x if x == 0 else 1)
Besides that, apply
doesn't act in-place, it always returns a new object. Therefore you need
df[columns] = df[columns].apply(get_label)
But in this simple case, using DataFrame.where
should be much faster if you are dealing with large DataFrames.
df[columns] = df[columns].where(lambda x: x == 0, 1)
Upvotes: 1