Reputation: 9159
Just trying to find the most elegant way to apply a really simple transformation to values in different columns with each column having it's own condition. So given a dataframe like this:
A B C D E F
0 1 2013-01-02 1 3 test foo
1 1 2013-01-02 1 3 train foo
2 1 2013-01-02 1 3 test foo
3 1 2013-01-02 1 3 train foo
Just want to have a function that will adjust the values in each column only if a second column has a specific value. In other words...
df['C'] = -1 if df['E'] == "test" else df['C'] next column...
df['D'] = -2 if df['E'] == "test" else df['D'] and so forth.
I was thinking the where function in pandas would come in handy here but wasn't sure how to apply it. I could do the below but does not seem very efficient and I would have to create a different function for each col:
def col(df):
if df['col1'] == "value":
return -1.00
else:
return relative_buckets['col1']
Upvotes: 1
Views: 688
Reputation: 352979
You can use .loc
with a boolean series:
>>> df
A B C D E F
0 1 2013-01-02 1 3 test foo
1 1 2013-01-02 1 3 train foo
2 1 2013-01-02 1 3 test foo
3 1 2013-01-02 1 3 train foo
>>> df.loc[df.E == "test", "C"] = -1
>>> df
A B C D E F
0 1 2013-01-02 -1 3 test foo
1 1 2013-01-02 1 3 train foo
2 1 2013-01-02 -1 3 test foo
3 1 2013-01-02 1 3 train foo
Using .loc
is preferable to trying to affect columns directly because of view vs. copy issues (see here for the gory details.)
If you want to change multiple columns at once, you can do that too:
>>> df.loc[df.E == "test", ["C","D"]] = [888, 999]
>>> df
A B C D E F
0 1 2013-01-02 888 999 test foo
1 1 2013-01-02 1 3 train foo
2 1 2013-01-02 888 999 test foo
3 1 2013-01-02 1 3 train foo
Upvotes: 1