Reputation: 7644
i have a dataframe:
a id flag1 flag2
abc 1 1 0
123 1 0 1
xyz 2 1 0
111 2 0 1
qwe 3 1 0
qwe 3 1 0
mmm 4 1 0
222 4 0 1
i want to find the id
number where both flag1 and flag2 are 1
.
for eg. for id 1, in the first row, flag1 = 1 and flag2 = 0, and second row, flag1 = 0 and flag2 = 1.
my final output should look like this
a id flag1 flag2
abc 1 1 0
123 1 0 1
xyz 2 1 0
111 2 0 1
mmm 4 1 0
222 4 0 1
or only id column would also work [1,2,4]
in a list
since for id = 3, flag1 was 1 in both the rows with id = 3 and flag 2 was 0, so i have to neglect it.
i was trying to write a func, but failed.
def checkValidTransactionRow(frame):
df['id'][(df['flag1']==1) & (df['flag2']==1) ].unique()
Upvotes: 2
Views: 237
Reputation: 210832
Try this approach:
In [23]: ids = df.groupby('id')['flag1','flag2'].apply(lambda x: x.eq(1).any()).all(1)
In [24]: ids
Out[24]:
id
1 True
2 True
3 False
4 True
dtype: bool
In [25]: ids.index[ids]
Out[25]: Int64Index([1, 2, 4], dtype='int64', name='id')
Explanation:
In [26]: df.groupby('id')['flag1','flag2'].apply(lambda x: x.eq(1).any())
Out[26]:
flag1 flag2
id
1 True True
2 True True
3 True False
4 True True
x.eq(1).any()
is the same as (x == 1).any()
- i.e. return True
if at least one value in x
series equals to 1
, otherwise return False
UPDATE:
In [34]: ids.index[ids].values
Out[34]: array([1, 2, 4], dtype=int64)
In [35]: ids.index[ids].values.tolist()
Out[35]: [1, 2, 4]
Upvotes: 3
Reputation: 2296
I am sure there is a better way of achieving this but you can try:
df['count_max1'] = df.groupby(['id'])['flag1'].transform(max)
df['count_max2'] = df.groupby(['id'])['flag2'].transform(max)
# Select rows
df[(df['count_max1'] ==1 & (df['count_max2'] == 1)]
What transform matrix does is:
a id flag1 flag2 count_max1 count_max2
abc 1 1 0 1 1
123 1 0 1 1 1
xyz 2 1 0 1 1
111 2 0 1 1 1
qwe 3 1 0 1 0
qwe 3 1 0 1 0
mmm 4 1 0 1 1
222 4 0 1 1 1
When you select the rows the final output will be:
a id flag1 flag2 count_max1 count_max2
abc 1 1 0 1 1
123 1 0 1 1 1
xyz 2 1 0 1 1
111 2 0 1 1 1
mmm 4 1 0 1 1
222 4 0 1 1 1
you can drop the rows count_max1 and count_max2 later on
Upvotes: 1