groupby and filter pandas

df:

    make    country other_columns   sale
honda   tokyo   data    1
honda   hirosima    data    0
toyota  tokyo   data    1
toyota  hirosima    data    0
suzuki  tokyo   data    0
suzuki  hirosima    data    0
ferrari tokyo   data    1
ferrari hirosima    data    0
nissan  tokyo   data    1
nissan  hirosima    data    0

Condition: keep pairs that have 1,0

desired output

make    country other_columns   sale
honda   tokyo   data    1
honda   hirosima    data    0
toyota  tokyo   data    1
toyota  hirosima    data    0
ferrari tokyo   data    1
ferrari hirosima    data    0
nissan  tokyo   data    1
nissan  hirosima    data    0

What i tried (which is obviously not working )

outdf = df.groupby('sale').filter(lambda x: x > 0)

should i be grouping both the columns ('sale' and 'make')?

Upvotes: 2

Views: 1373

Answers (2)

jpp
jpp

Reputation: 164673

Here's a solution using groupby + set. This should be extendable and does not require ordering:

s = df.groupby('make')['sale'].apply(set)
res = df[df['make'].map(s) >= {0, 1}]

print(res)

      make   country other_columns  sale
0    honda     tokyo          data     1
1    honda  hirosima          data     0
2   toyota     tokyo          data     1
3   toyota  hirosima          data     0
6  ferrari     tokyo          data     1
7  ferrari  hirosima          data     0
8   nissan     tokyo          data     1
9   nissan  hirosima          data     0

Upvotes: 2

Ben.T
Ben.T

Reputation: 29635

Indeed, you need groupby but on the column 'make' and you can use filter with a function creating a tuple from 'sale' column:

outdf = df.groupby('make').filter(lambda x: tuple(x.sale)== (1,0))

and you get the expected output:

      make   country other_columns  sale
0    honda     tokyo          data     1
1    honda  hirosima          data     0
2   toyota     tokyo          data     1
3   toyota  hirosima          data     0
6  ferrari     tokyo          data     1
7  ferrari  hirosima          data     0
8   nissan     tokyo          data     1
9   nissan  hirosima          data     0

if the order between 1 and 0 is not always satisfy for a same 'make', you can use a sort_values on 'sale' column first

Upvotes: 6

Related Questions