Reputation:
I have a Dataframe, df, and wondered how you perform If and Else If statements, and if neither these 'if/elseif' statements are met rows are to remain unchanged. So, If df ['Age'] equals 9, (Score 1 = 5, Score 2 = 7) Otherwise, if df['Age'] equals 10, (Score 1 = 10, Score 2 = 12) Otherwise leave rows unchanged. Using a lambda function would be ideal.
Before;
Age Score 1 Score 2 AA
9 13.0 3.5 144
10 12.0 3.1 133
9 11.0 2.8 100
8 14.0 1.5 110
14 14.0 1.2 97
After;
Age Score 1 Score 2 AA
9 5.0 7.0 144
10 10.0 12.0 133
9 5.0 7.0 100
8 14.0 1.5 110
14 14.0 1.2 97
Upvotes: 3
Views: 92
Reputation: 862581
One idea is use DataFrame.loc
with condition and specify both columns in list - so possible assign list [5,7]
:
df.loc[df['Age'] == 9, ['Score 1','Score 2']] = [5,7]
df.loc[df['Age'] == 10, ['Score 1','Score 2']] = [10,12]
print (df)
Age Score 1 Score 2 AA
0 9 5.0 7.0 144
1 10 10.0 12.0 133
2 9 5.0 7.0 100
3 8 14.0 1.5 110
4 14 14.0 1.2 97
Another is use numpy.select
with broadcasting, because working with both columns:
m1 = df['Age'].eq(9).values[:, None]
m2 = df['Age'].eq(10).values[:, None]
v1 = [5,7]
v2 = [10, 12]
df[['Score 1','Score 2']] = np.select([m1, m2], [v1, v2], default = df[['Score 1','Score 2']])
print (df)
Age Score 1 Score 2 AA
0 9 5.0 7.0 144
1 10 10.0 12.0 133
2 9 5.0 7.0 100
3 8 14.0 1.5 110
4 14 14.0 1.2 97
Solution with lambda is possible, but really slow:
s = pd.Series([5, 7], index=['Score 1','Score 2'])
f = lambda x: s if x['Age'] == 9 else x[['Score 1', 'Score 2']]
df[['Score 1','Score 2']] = df.apply(f, axis=1)
print (df)
Age Score 1 Score 2 AA
0 9 5.0 7.0 144
1 10 12.0 3.1 133
2 9 5.0 7.0 100
3 8 14.0 1.5 110
4 14 14.0 1.2 97
Performance:
#5k rows
df = pd.concat([df] * 1000, ignore_index=True)
In [113]: %timeit df.loc[df['Age'] == 9, ['Score 1','Score 2']] = [5,7]
2.35 ms ± 28.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [114]: %timeit df[['Score 1','Score 2']] = df.apply(lambda x: pd.Series([5, 7], index=['Score 1','Score 2']) if x['Age'] == 9 else x[['Score 1', 'Score 2']], axis=1)
2.88 s ± 138 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Performance first 2 solutions for more rows:
#50k rows
df = pd.concat([df] * 10000, ignore_index=True)
In [120]: %%timeit
...: df.loc[df['Age'] == 9, ['Score 1','Score 2']] = [5,7]
...: df.loc[df['Age'] == 10, ['Score 1','Score 2']] = [10,12]
...:
6.33 ms ± 59.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [121]: %%timeit
...: df[['Score 1','Score 2']] = np.select([df['Age'].eq(9).values[:, None], df['Age'].eq(10).values[:, None]], [[5,7], [10, 12]], df[['Score 1','Score 2']])
...:
19.1 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 1