Reputation: 281
I have a Pandas dataframe like this:
A B C D
0 month month+1 quarter+1 season+1
1 season month+5 quarter+3 season+2
2 day month+1 quarter+2 season+1
3 year month+3 quarter+4 season+2
4 quarter month+2 quarter+1 season+1
5 month month+4 quarter+1 season+2
I would like to insert a new column called 'E' based on several IF conditions. If column 'A' equals 'month' then return values in 'B', if column 'A' equals 'quarter' then return values in 'C', if column 'A' equals 'season' then return values in 'D', and if not then return values in column 'A'
A B C D E
0 month month+1 quarter+1 season+1 month+1
1 season month+5 quarter+3 season+2 season+2
2 day month+1 quarter+2 season+1 day
3 year month+3 quarter+4 season+2 year
4 quarter month+2 quarter+1 season+1 quarter+1
5 month month+4 quarter+1 season+2 month+4
I am having trouble doing this. I have tried playing around with a function but it did not work. See my attempt:
def f(row):
if row['A'] == 'month':
val = ['B']
elif row['A'] == 'quarter':
val = ['C']
elif row['A'] == 'season':
val = ['D']
else:
val = ['A']
return val
df['E'] = df.apply(f, axis=1)
EDITED: to change the last else
to column 'A'
Upvotes: 2
Views: 3373
Reputation: 30920
Frist, I recommend you see: When should I want to use apply() in my code.
I would use Series.replace
df['E'] = df['A'].replace(['month','quarter','season'],
[df['B'], df['C'], df['D']])
or numpy.select
cond = [df['A'].eq('month'), df['A'].eq('quarter'), df['A'].eq('season')]
values= [df['B'], df['C'], df['D']]
df['E']=np.select(cond,values,default=df['A'])
A B C D E
0 month month+1 quarter+1 season+1 month+1
1 season month+5 quarter+3 season+2 season+2
2 day month+1 quarter+2 season+1 day
3 year month+3 quarter+4 season+2 year
4 quarter month+2 quarter+1 season+1 quarter+1
5 month month+4 quarter+1 season+2 month+4
Upvotes: 4
Reputation: 25239
Just use np.select
c1 = df['A'] == 'month'
c2 = df['A'] == 'quarter'
c3 = df['A'] == 'season'
df['E'] = np.select([c1, c2, c3], [df['B'], df['C'], df['D']], df['A'])
Out[271]:
A B C D E
0 month month+1 quarter+1 season+1 month+1
1 season month+5 quarter+3 season+2 season+2
2 day month+1 quarter+2 season+1 day
3 year month+3 quarter+4 season+2 year
4 quarter month+2 quarter+1 season+1 quarter+1
5 month month+4 quarter+1 season+2 month+4
Upvotes: 3
Reputation: 4459
You probably need to fix your code like this:
def f(row):
if row['A'] == 'month':
val = row['B']
elif row['A'] == 'quarter':
val = row['C']
elif row['A'] == 'season':
val = row['D']
else:
val = row['D']
return val
df['E'] = df.apply(f, axis=1)
note: you forgot to include row
val = ['B'] # before
val = row['B'] # after
Edit: This is just to point out the problem in the code, for better approaches check out the other answers related to the usage of numpy.select
Upvotes: 1