horatio1701d
horatio1701d

Reputation: 9159

Pandas Multiple Conditions Function based on Column

Just trying to find the most elegant way to apply a really simple transformation to values in different columns with each column having it's own condition. So given a dataframe like this:

   A      B      C  D    E     F
0  1 2013-01-02  1  3   test  foo
1  1 2013-01-02  1  3  train  foo
2  1 2013-01-02  1  3   test  foo
3  1 2013-01-02  1  3  train  foo

Just want to have a function that will adjust the values in each column only if a second column has a specific value. In other words...

df['C'] = -1 if df['E'] == "test" else df['C'] next column...
df['D'] = -2 if df['E'] == "test" else df['D'] and so forth.

I was thinking the where function in pandas would come in handy here but wasn't sure how to apply it. I could do the below but does not seem very efficient and I would have to create a different function for each col:

def col(df):
    if df['col1'] == "value":
        return -1.00
    else:
        return relative_buckets['col1']

Upvotes: 1

Views: 688

Answers (1)

DSM
DSM

Reputation: 352979

You can use .loc with a boolean series:

>>> df
   A           B  C  D      E    F
0  1  2013-01-02  1  3   test  foo
1  1  2013-01-02  1  3  train  foo
2  1  2013-01-02  1  3   test  foo
3  1  2013-01-02  1  3  train  foo
>>> df.loc[df.E == "test", "C"] = -1
>>> df
   A           B  C  D      E    F
0  1  2013-01-02 -1  3   test  foo
1  1  2013-01-02  1  3  train  foo
2  1  2013-01-02 -1  3   test  foo
3  1  2013-01-02  1  3  train  foo

Using .loc is preferable to trying to affect columns directly because of view vs. copy issues (see here for the gory details.)

If you want to change multiple columns at once, you can do that too:

>>> df.loc[df.E == "test", ["C","D"]] = [888, 999]
>>> df
   A           B    C    D      E    F
0  1  2013-01-02  888  999   test  foo
1  1  2013-01-02    1    3  train  foo
2  1  2013-01-02  888  999   test  foo
3  1  2013-01-02    1    3  train  foo

Upvotes: 1

Related Questions