Reputation: 704
df:
make country other_columns sale
honda tokyo data 1
honda hirosima data 0
toyota tokyo data 1
toyota hirosima data 0
suzuki tokyo data 0
suzuki hirosima data 0
ferrari tokyo data 1
ferrari hirosima data 0
nissan tokyo data 1
nissan hirosima data 0
Condition: keep pairs that have 1,0
desired output
make country other_columns sale
honda tokyo data 1
honda hirosima data 0
toyota tokyo data 1
toyota hirosima data 0
ferrari tokyo data 1
ferrari hirosima data 0
nissan tokyo data 1
nissan hirosima data 0
What i tried (which is obviously not working )
outdf = df.groupby('sale').filter(lambda x: x > 0)
should i be grouping both the columns ('sale' and 'make')?
Upvotes: 2
Views: 1373
Reputation: 164673
Here's a solution using groupby
+ set
. This should be extendable and does not require ordering:
s = df.groupby('make')['sale'].apply(set)
res = df[df['make'].map(s) >= {0, 1}]
print(res)
make country other_columns sale
0 honda tokyo data 1
1 honda hirosima data 0
2 toyota tokyo data 1
3 toyota hirosima data 0
6 ferrari tokyo data 1
7 ferrari hirosima data 0
8 nissan tokyo data 1
9 nissan hirosima data 0
Upvotes: 2
Reputation: 29635
Indeed, you need groupby
but on the column 'make' and you can use filter
with a function creating a tuple
from 'sale' column:
outdf = df.groupby('make').filter(lambda x: tuple(x.sale)== (1,0))
and you get the expected output:
make country other_columns sale
0 honda tokyo data 1
1 honda hirosima data 0
2 toyota tokyo data 1
3 toyota hirosima data 0
6 ferrari tokyo data 1
7 ferrari hirosima data 0
8 nissan tokyo data 1
9 nissan hirosima data 0
if the order between 1 and 0 is not always satisfy for a same 'make', you can use a sort_values
on 'sale' column first
Upvotes: 6