Alexandru Antochi
Alexandru Antochi

Reputation: 1465

Map elements of multiple columns in Pandas

I'm trying to label some values in a DataFrame in Pandas based on the value itself, in-place.

df = pd.read_csv('data/extrusion.csv')
# get list of columns that contain thickness
columns = [c  for c in data.columns if 'SDickeIst'.lower() in c.lower()]

# create a function that returns the class based on value
def get_label(ser):
    ser.map(lambda x : x if x == 0 else 1)

df[columns].apply(get_label)

I would expect that the apply function takes each column in particular and applies get_label on it. In turn, get_label gets the ser argument as a Series and uses map to map each element != 0 with 1.

Upvotes: 1

Views: 156

Answers (1)

Rodalm
Rodalm

Reputation: 5503

get_label doesn't return anything.

You want to return ser.map(lambda x : x if x == 0 else 1).

def get_label(ser):
    return ser.map(lambda x : x if x == 0 else 1)

Besides that, apply doesn't act in-place, it always returns a new object. Therefore you need

df[columns] = df[columns].apply(get_label)

But in this simple case, using DataFrame.where should be much faster if you are dealing with large DataFrames.

df[columns] = df[columns].where(lambda x: x == 0, 1)

Upvotes: 1

Related Questions