Reputation: 11
I have 4 columns in a pandas dataframe and millions of rows. I need to compare all the rows and make a new column 'value' such that any rows should have the same values in columns 'ID', 'Amount' and 'date' and different values in column 'debit/credit' ( 'debit/credit' takes only debit, credit as values) for eg. the following table
ID | Amount | date | debit/credit |
---|---|---|---|
123 | 10 | 11JAN2021 | debit |
256 | 20 | 03FEB2021 | credit |
123 | 10 | 11JAN2021 | credit |
256 | 30 | 22FEB2021 | debit |
568 | 40 | 03MAR2021 | credit |
568 | 40 | 15MAR2021 | debit |
should give me something like this
ID | Amount | date | debit/credit | value |
---|---|---|---|---|
123 | 10 | 11JAN2021 | debit | 1 |
256 | 20 | 03FEB2021 | credit | 0 |
123 | 10 | 11JAN2021 | credit | 1 |
256 | 30 | 22FEB2021 | debit | 0 |
568 | 40 | 03MAR2021 | credit | 0 |
568 | 40 | 15MAR2021 | debit | 0 |
Upvotes: 1
Views: 871
Reputation: 260455
IIUC, you can use:
df['value'] = (df.groupby(['ID', 'Amount', 'date'])['debit/credit']
.transform('nunique').eq(2).astype(int)
)
output:
ID Amount date debit/credit value
0 123 10 11JAN2021 debit 1
1 256 20 03FEB2021 credit 0
2 123 10 11JAN2021 credit 1
3 256 30 22FEB2021 debit 0
4 568 40 03MAR2021 credit 0
5 568 40 15MAR2021 debit 0
Alternative using pivot_table
and merge
:
cols = ['ID', 'Amount', 'date']
s = (df.assign(x=True)
.pivot_table(index=cols, columns='debit/credit',
aggfunc=any, fill_value=False)
.all(1).astype(int)
)
out = df.merge(s.rename('value'), left_on=cols, right_index=True, how='left')
Upvotes: 2