Reputation: 168
Say I have a DataFrame like following:
df = pd.DataFrame({
'group':['A', 'A', 'A', 'B', 'B', 'C'],
'amount':[100, -100, 50, 30, -30, 40]
})
If I would like to add another column as to check if the amount of each row can be paired (i.e. same amount, but 1 positive and 1 negative) within a group.
For example, in group A, 100 & -100 can be paired, then they will be True, while 50 cannot find a pair, then it's False (like the following table).
group | amount | pair |
---|---|---|
A | 100 | True |
A | -100 | True |
A | 50 | False |
B | 30 | True |
B | -30 | True |
C | 40 | False |
What would be the most efficient way to do this?
Upvotes: 0
Views: 849
Reputation: 35686
We can take the abs
of the amount
column, then create the pair
column based on where the values are DataFrame.duplicated
:
df['pair'] = df.assign(amount=df['amount'].abs()).duplicated(keep=False)
*keep=False
means both duplicated rows get True
. The right hand side can also be subset
if the DataFrame has more than these 2 columns.
df
:
group amount pair
0 A 100 True
1 A -100 True
2 A 50 False
3 B 30 True
4 B -30 True
5 C 40 False
Update to handle duplicate values, but ensure only positive negative pairs get matched using pivot_table
:
Updated DataFrame:
df = pd.DataFrame({
'group': ['A', 'A', 'A', 'A', 'B', 'B', 'C'],
'amount': [100, -100, 50, 50, 30, -30, 40]
})
Pivot to wide form and check for pairs:
df['abs_amount'] = df['amount'].abs()
df = df.join(
df.pivot_table(index=['group', 'abs_amount'],
columns=df['amount'].gt(0),
values='amount',
aggfunc='first')
.notnull().all(axis=1)
.rename('pair'),
on=['group', 'abs_amount']
).drop('abs_amount', axis=1)
df
:
group amount pair
0 A 100 True
1 A -100 True
2 A 50 False
3 A 50 False
4 B 30 True
5 B -30 True
6 C 40 False
The pivot_table
:
df['abs_amount'] = df['amount'].abs()
df.pivot_table(index=['group', 'abs_amount'],
columns=df['amount'].gt(0),
values='amount',
aggfunc='first')
amount False True
group abs_amount
A 50 NaN 50.0 # Multiple 50s but no -50
100 -100.0 100.0
B 30 -30.0 30.0
C 40 NaN 40.0
Ensure all values in the row:
df.pivot_table(index=['group', 'abs_amount'],
columns=df['amount'].gt(0),
values='amount',
aggfunc='first').notnull().all(axis=1)
group abs_amount
A 50 False
100 True
B 30 True
C 40 False
dtype: bool
Upvotes: 3