jason
jason

Reputation: 4429

pandas if then statement without looping

All I'm trying to do is add columns data1 and data2 if in the same row letters is a and subtract if it is c. multipy if it is b. Here is my code.

import pandas as pd

a=[['Date', 'letters', 'data1', 'data2'], ['1/2/2014', 'a', 6, 1], ['1/2/2014', 'a', 3, 1], ['1/3/2014', 'c', 1, 3],['1/3/2014', 'b', 3, 5]]
df = pd.DataFrame.from_records(a[1:],columns=a[0])

df['result']=df['data1']
for i in range(0,len(df)):
    if df['letters'][i]=='a':
        df['result'][i]=df['data1'][i]+df['data2'][i]
    if df['letters'][i]=='b':
        df['result'][i]=df['data1'][i]*df['data2'][i]
    if df['letters'][i]=='c':
        df['result'][i]=df['data1'][i]-df['data2'][i]

>>> df
       Date letters  data1  data2  result
0  1/2/2014       a      6      1       7
1  1/2/2014       a      3      1       4
2  1/3/2014       c      1      3      -2
3  1/3/2014       b      3      5      15

My question: is there a way to do it in one line without looping? something to the spirit of:

df['result']=df['result'].map(lambda x:df['data1'][i]+df['data2'][i] if x =='a' df['data1'][i]-df['data2'][i] elif x =='c' else x)`

Upvotes: 0

Views: 1892

Answers (2)

Ffisegydd
Ffisegydd

Reputation: 53668

You can use df.apply in combination with a lambda function. You have to use the keyword argument axis=1 to ensure you work on rows as opposed to the columns.

import pandas as pd

a=[['Date', 'letters', 'data1', 'data2'], ['1/2/2014', 'a', 6, 1], ['1/2/2014', 'a', 3, 1], ['1/3/2014', 'c', 1, 3]]
df = pd.DataFrame.from_records(a[1:],columns=a[0])

from operator import add, sub, mul

d = dict(a=add, b=mul, c=sub)

df['result'] = df.apply(lambda r: d[r['letters']](r['data1'], r['data2']), axis=1)

This will use the dictionary d to get the function you wish to use (add, sub, or mul).

Original solution below

df['result'] = df.apply(lambda r: r['data1'] + r['data2'] if r['letters'] == 'a' 
                        else r['data1'] - r['data2'] if r['letters'] == 'c' 
                        else r['data1'] * r['data2'], axis=1)

print(df)
       Date letters  data1  data2  result
0  1/2/2014       a      6      1       7
1  1/2/2014       b      3      1       3
2  1/3/2014       c      1      3      -2

The lambda function is a bit complex now so I'll go into it in a bit more detail...

The lambda function uses a so-called ternary operator to make boolean conditions inside one line, a typical ternary expession is of the form

a if b else c

Unfortunately you can't have an elif with a ternary expression, but what you can do is place another one inside the else statement, then it becomes

a if b else c if d else e

Upvotes: 1

behzad.nouri
behzad.nouri

Reputation: 77941

You can use the .where method:

where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True) method of pandas.core.series.Series instance

Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

as in:

>>> df['data1'] + df['data2'].where(df['letters'] == 'a', - df['data2'])
0    7
1    4
2   -2
dtype: int64

alternatively, numpy.where:

>>> df['data1'] + np.where(df['letters'] == 'a', 1, -1) * df['data2']
0    7
1    4
2   -2
dtype: int64

Upvotes: 1

Related Questions