Shubham R
Shubham R

Reputation: 7644

check validity for flag column's in multiple rows pandas

i have a dataframe:

a      id   flag1    flag2
abc    1     1          0
123    1     0          1
xyz    2     1          0
111    2     0          1
qwe    3     1          0
qwe    3     1          0
mmm    4     1          0
222    4     0          1

i want to find the id number where both flag1 and flag2 are 1.

for eg. for id 1, in the first row, flag1 = 1 and flag2 = 0, and second row, flag1 = 0 and flag2 = 1.

my final output should look like this

a    id    flag1   flag2
abc    1     1          0
123    1     0          1
xyz    2     1          0
111    2     0          1
mmm    4     1          0
222    4     0          1

or only id column would also work [1,2,4] in a list

since for id = 3, flag1 was 1 in both the rows with id = 3 and flag 2 was 0, so i have to neglect it.

i was trying to write a func, but failed.

def checkValidTransactionRow(frame):
df['id'][(df['flag1']==1) & (df['flag2']==1) ].unique()

Upvotes: 2

Views: 237

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

Try this approach:

In [23]: ids = df.groupby('id')['flag1','flag2'].apply(lambda x: x.eq(1).any()).all(1)

In [24]: ids
Out[24]:
id
1     True
2     True
3    False
4     True
dtype: bool

In [25]: ids.index[ids]
Out[25]: Int64Index([1, 2, 4], dtype='int64', name='id')

Explanation:

In [26]: df.groupby('id')['flag1','flag2'].apply(lambda x: x.eq(1).any())
Out[26]:
   flag1  flag2
id
1   True   True
2   True   True
3   True  False
4   True   True

x.eq(1).any() is the same as (x == 1).any() - i.e. return True if at least one value in x series equals to 1, otherwise return False

UPDATE:

In [34]: ids.index[ids].values
Out[34]: array([1, 2, 4], dtype=int64)

In [35]: ids.index[ids].values.tolist()
Out[35]: [1, 2, 4]

Upvotes: 3

edyvedy13
edyvedy13

Reputation: 2296

I am sure there is a better way of achieving this but you can try:

df['count_max1'] = df.groupby(['id'])['flag1'].transform(max)
df['count_max2'] = df.groupby(['id'])['flag2'].transform(max)
# Select rows 
df[(df['count_max1'] ==1 & (df['count_max2'] == 1)]

What transform matrix does is:

a      id   flag1    flag2    count_max1   count_max2
abc    1     1          0         1            1
123    1     0          1         1            1
xyz    2     1          0         1            1
111    2     0          1         1            1
qwe    3     1          0         1            0
qwe    3     1          0         1            0
mmm    4     1          0         1            1
222    4     0          1         1            1

When you select the rows the final output will be:

a    id    flag1   flag2    count_max1  count_max2
abc    1     1          0       1            1
123    1     0          1       1            1
xyz    2     1          0       1            1
111    2     0          1       1            1
mmm    4     1          0       1            1
222    4     0          1       1            1 

you can drop the rows count_max1 and count_max2 later on

Upvotes: 1

Related Questions