unknown_max
unknown_max

Reputation: 11

I want to make a new column based on pandas dataframe rows having equal values in multiple columns and different values in 1 column

I have 4 columns in a pandas dataframe and millions of rows. I need to compare all the rows and make a new column 'value' such that any rows should have the same values in columns 'ID', 'Amount' and 'date' and different values in column 'debit/credit' ( 'debit/credit' takes only debit, credit as values) for eg. the following table

ID Amount date debit/credit
123 10 11JAN2021 debit
256 20 03FEB2021 credit
123 10 11JAN2021 credit
256 30 22FEB2021 debit
568 40 03MAR2021 credit
568 40 15MAR2021 debit

should give me something like this

ID Amount date debit/credit value
123 10 11JAN2021 debit 1
256 20 03FEB2021 credit 0
123 10 11JAN2021 credit 1
256 30 22FEB2021 debit 0
568 40 03MAR2021 credit 0
568 40 15MAR2021 debit 0

Upvotes: 1

Views: 871

Answers (1)

mozway
mozway

Reputation: 260455

IIUC, you can use:

df['value'] = (df.groupby(['ID', 'Amount', 'date'])['debit/credit']
                 .transform('nunique').eq(2).astype(int)
               )

output:

    ID  Amount       date debit/credit  value
0  123      10  11JAN2021        debit      1
1  256      20  03FEB2021       credit      0
2  123      10  11JAN2021       credit      1
3  256      30  22FEB2021        debit      0
4  568      40  03MAR2021       credit      0
5  568      40  15MAR2021        debit      0

Alternative using pivot_table and merge:

cols = ['ID', 'Amount', 'date']
s = (df.assign(x=True)
       .pivot_table(index=cols, columns='debit/credit',
                    aggfunc=any, fill_value=False)
       .all(1).astype(int)
    )
out = df.merge(s.rename('value'), left_on=cols, right_index=True, how='left')

Upvotes: 2

Related Questions