Artorias-7
Artorias-7

Reputation: 168

Pair rows in Pandas DataFrame if condition met

Say I have a DataFrame like following:

df = pd.DataFrame({
    'group':['A', 'A', 'A', 'B', 'B', 'C'], 
    'amount':[100, -100, 50, 30, -30, 40]
})

If I would like to add another column as to check if the amount of each row can be paired (i.e. same amount, but 1 positive and 1 negative) within a group.

For example, in group A, 100 & -100 can be paired, then they will be True, while 50 cannot find a pair, then it's False (like the following table).

group amount pair
A 100 True
A -100 True
A 50 False
B 30 True
B -30 True
C 40 False

What would be the most efficient way to do this?

Upvotes: 0

Views: 849

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35686

We can take the abs of the amount column, then create the pair column based on where the values are DataFrame.duplicated:

df['pair'] = df.assign(amount=df['amount'].abs()).duplicated(keep=False)

*keep=False means both duplicated rows get True. The right hand side can also be subset if the DataFrame has more than these 2 columns.

df:

  group  amount   pair
0     A     100   True
1     A    -100   True
2     A      50  False
3     B      30   True
4     B     -30   True
5     C      40  False

Update to handle duplicate values, but ensure only positive negative pairs get matched using pivot_table:

Updated DataFrame:

df = pd.DataFrame({
    'group': ['A', 'A', 'A', 'A', 'B', 'B', 'C'],
    'amount': [100, -100, 50, 50, 30, -30, 40]
})

Pivot to wide form and check for pairs:

df['abs_amount'] = df['amount'].abs()
df = df.join(
    df.pivot_table(index=['group', 'abs_amount'],
                   columns=df['amount'].gt(0),
                   values='amount',
                   aggfunc='first')
        .notnull().all(axis=1)
        .rename('pair'),
    on=['group', 'abs_amount']
).drop('abs_amount', axis=1)

df:

  group  amount   pair
0     A     100   True
1     A    -100   True
2     A      50  False
3     A      50  False
4     B      30   True
5     B     -30   True
6     C      40  False

The pivot_table:

df['abs_amount'] = df['amount'].abs()
df.pivot_table(index=['group', 'abs_amount'],
               columns=df['amount'].gt(0),
               values='amount',
               aggfunc='first')
amount            False   True
group abs_amount              
A     50            NaN   50.0  # Multiple 50s but no -50
      100        -100.0  100.0
B     30          -30.0   30.0
C     40            NaN   40.0

Ensure all values in the row:

df.pivot_table(index=['group', 'abs_amount'],
               columns=df['amount'].gt(0),
               values='amount',
               aggfunc='first').notnull().all(axis=1)
group  abs_amount
A      50            False
       100            True
B      30             True
C      40            False
dtype: bool

Upvotes: 3

Related Questions