Reputation: 366
I'm hoping to subtract columns where a separate column is equal to a specific value. For instance, where Val == A
, I want to subtract A - B
. However, where Val == B
, I want to subtract B - A
.
df = pd.DataFrame({
'Val' : ['A','B','A','B','A','B'],
'A_1' : [1, 2, 3, 4, 5, 6],
'A_2' : [2, 3, 4, 5, 6, 1],
'B_1' : [6, 5, 4, 3, 2, 1],
'B_2' : [6, 5, 4, 3, 2, 1],
})
for x in df['Val']:
if x == 'A':
df['1_Sum'] = df.iloc[:,-4] - df.iloc[:,-2]
df['2_Sum'] = df.iloc[:,-4] - df.iloc[:,-2]
elif x == 'B':
df['1_Sum'] = df.iloc[:,-2] - df.iloc[:,-4]
df['2_Sum'] = df.iloc[:,-2] - df.iloc[:,-4]
Intended output:
Val A_1 A_2 B_1 B_2 1_Sum 2_Sum
0 A 1 2 6 6 -5 -4
1 B 2 3 5 5 3 2
2 A 3 4 4 4 -1 0
3 B 4 5 3 3 -1 -2
4 A 5 6 2 2 3 4
5 B 6 1 1 1 -5 0
Upvotes: 2
Views: 375
Reputation: 862911
Use numpy.select
with subtract filtered columns by DataFrame.filter
, only necessary same order and same number of each column for A
and B
groups:
df1 = df.filter(like='A').sub(df.filter(like='B').to_numpy())
df2 = df.filter(like='B').sub(df.filter(like='A').to_numpy())
m1 = df['Val'].eq('A').to_numpy()[:, None]
m2 = df['Val'].eq('B').to_numpy()[:, None]
df3 = (pd.DataFrame(np.select([m1, m2], [df1, df2]), index=df.index)
.rename(columns=lambda x: f'{x+1}_Sum'))
df = df.join(df3)
print (df)
Val A_1 A_2 B_1 B_2 1_Sum 2_Sum
0 A 1 2 6 6 -5 -4
1 B 2 3 5 5 3 2
2 A 3 4 4 4 -1 0
3 B 4 5 3 3 -1 -2
4 A 5 6 2 2 3 4
5 B 6 1 1 1 -5 0
If want subtract each Series separately:
mask = df.Val=='A'
df["1_Sum"] = np.where(mask, df.A_1 - df.B_1, df.B_1 - df.A_1)
df["2_Sum"] = np.where(mask, df.A_2 - df.B_2, df.B_2 - df.A_2)
print (df)
Val A_1 A_2 B_1 B_2 1_Sum 2_Sum
0 A 1 2 6 6 -5 -4
1 B 2 3 5 5 3 2
2 A 3 4 4 4 -1 0
3 B 4 5 3 3 -1 -2
4 A 5 6 2 2 3 4
5 B 6 1 1 1 -5 0
EDIT:
Never use apply
for subtract values, because loops under the hood, so really slow, here test for DataFrame with 6k rows:
df = pd.DataFrame({
'Val' : ['A','B','A','B','A','B'],
'A_1' : [1, 2, 3, 4, 5, 6],
'A_2' : [2, 3, 4, 5, 6, 1],
'B_1' : [6, 5, 4, 3, 2, 1],
'B_2' : [6, 5, 4, 3, 2, 1],
})
df = pd.concat([df] * 1000, ignore_index=True)
df["1_Sum1"] = df.apply(lambda x: x.A_1 - x.B_1 if x.Val=='A' else x.B_1 - x.A_1, axis=1 )
df["1_Sum2"] = np.where(df.Val=='A', df.A_1 - df.B_1, df.B_1 - df.A_1)
In [77]: %timeit df["1_Sum1"] = df.apply(lambda x: x.A_1 - x.B_1 if x.Val=='A' else x.B_1 - x.A_1, axis=1 )
271 ms ± 7.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [78]: %timeit df["1_Sum2"] = np.where(df.Val=='A', df.A_1 - df.B_1, df.B_1 - df.A_1)
1.04 ms ± 4.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 3
Reputation: 3926
Here you go:
df["1_Sum"] = df.apply(lambda x: x.A_1 - x.B_1 if x.Val=='A' else x.A_1 - x.B_1, axis=1 )
Similarly you can get the other col. No need to iterate.
Upvotes: 0