user11357465
user11357465

Reputation:

How do you perform If and Else If statements within a DataFrame?

I have a Dataframe, df, and wondered how you perform If and Else If statements, and if neither these 'if/elseif' statements are met rows are to remain unchanged. So, If df ['Age'] equals 9, (Score 1 = 5, Score 2 = 7) Otherwise, if df['Age'] equals 10, (Score 1 = 10, Score 2 = 12) Otherwise leave rows unchanged. Using a lambda function would be ideal.

Before;

Age    Score 1    Score 2    AA

9      13.0         3.5      144
10     12.0         3.1      133
9      11.0         2.8      100
8      14.0         1.5      110
14     14.0         1.2      97

After;

Age    Score 1    Score 2    AA

9      5.0          7.0      144
10     10.0         12.0      133
9      5.0          7.0      100
8      14.0         1.5      110
14     14.0         1.2      97

Upvotes: 3

Views: 92

Answers (1)

jezrael
jezrael

Reputation: 862581

One idea is use DataFrame.loc with condition and specify both columns in list - so possible assign list [5,7]:

df.loc[df['Age'] == 9, ['Score 1','Score 2']] = [5,7]
df.loc[df['Age'] == 10, ['Score 1','Score 2']] = [10,12]
print (df)
   Age  Score 1  Score 2   AA
0    9      5.0      7.0  144
1   10     10.0     12.0  133
2    9      5.0      7.0  100
3    8     14.0      1.5  110
4   14     14.0      1.2   97

Another is use numpy.select with broadcasting, because working with both columns:

m1 = df['Age'].eq(9).values[:, None]
m2 = df['Age'].eq(10).values[:, None]

v1 = [5,7]
v2 = [10, 12]

df[['Score 1','Score 2']] = np.select([m1, m2], [v1, v2], default = df[['Score 1','Score 2']])

print (df)
   Age  Score 1  Score 2   AA
0    9      5.0      7.0  144
1   10     10.0     12.0  133
2    9      5.0      7.0  100
3    8     14.0      1.5  110
4   14     14.0      1.2   97

Solution with lambda is possible, but really slow:

s = pd.Series([5, 7], index=['Score 1','Score 2'])
f = lambda x: s if  x['Age'] == 9 else x[['Score 1', 'Score 2']]
df[['Score 1','Score 2']] = df.apply(f, axis=1)
print (df)
   Age  Score 1  Score 2   AA
0    9      5.0      7.0  144
1   10     12.0      3.1  133
2    9      5.0      7.0  100
3    8     14.0      1.5  110
4   14     14.0      1.2   97

Performance:

#5k rows
df = pd.concat([df] * 1000, ignore_index=True)

In [113]: %timeit df.loc[df['Age'] == 9, ['Score 1','Score 2']] = [5,7]
2.35 ms ± 28.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [114]: %timeit df[['Score 1','Score 2']] = df.apply(lambda x: pd.Series([5, 7], index=['Score 1','Score 2']) if  x['Age'] == 9 else x[['Score 1', 'Score 2']], axis=1)
2.88 s ± 138 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Performance first 2 solutions for more rows:

#50k rows
df = pd.concat([df] * 10000, ignore_index=True)

In [120]: %%timeit
     ...: df.loc[df['Age'] == 9, ['Score 1','Score 2']] = [5,7]
     ...: df.loc[df['Age'] == 10, ['Score 1','Score 2']] = [10,12]
     ...: 
6.33 ms ± 59.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [121]: %%timeit
     ...: df[['Score 1','Score 2']] = np.select([df['Age'].eq(9).values[:, None], df['Age'].eq(10).values[:, None]], [[5,7], [10, 12]], df[['Score 1','Score 2']])
     ...: 
19.1 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 1

Related Questions