Night Walker
Night Walker

Reputation: 21260

filtering dataframe on multiple conditions

data = {'year': ['11:23:19', '11:23:19', '11:24:19', '11:25:19', '11:25:19', '11:23:19', '11:23:19', '11:23:19', '11:23:19', '11:23:19'],
                'store_number': ['1944', '1945', '1946', '1948', '1948', '1949', '1947', '1948', '1949', '1947'],
                'retailer_name': ['Walmart', 'Walmart', 'CRV', 'CRV', 'CRV', 'Walmart', 'Walmart', 'CRV', 'CRV', 'CRV'],
                'amount': [5, 5, 8, 6, 1, 5, 10, 6, 12, 11],
                'id': [10, 10, 11, 11, 11, 10, 10, 11, 11, 10]}

        stores = pd.DataFrame(data, columns=['retailer_name', 'store_number', 'year', 'amount', 'id'])
        stores.set_index(['retailer_name', 'store_number', 'year'], inplace=True)
        stores_grouped = stores.groupby(level=[0, 1, 2])

That looks like:

                                     amount  id
retailer_name store_number year                
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
CRV           1946         11:24:19       8  11
              1948         11:25:19       6  11
                           11:25:19       1  11
Walmart       1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1948         11:23:19       6  11
              1949         11:23:19      12  11
              1947         11:23:19      11  10

I manage to filter on: stores_grouped.filter(lambda x: (len(x) == 1))

But when I want to filter on two conditions:

That my group has length one and id column is equals 10. Any idea ho do so ?

Upvotes: 5

Views: 1063

Answers (4)

piRSquared
piRSquared

Reputation: 294218

Thinking outside the box, use drop_duplicates with keep=False:

df.drop_duplicates(subset=['retailer_name', 'store_number', 'year'], keep=False) \
    .query('id == 10')

enter image description here


Timing

enter image description here

Upvotes: 1

jezrael
jezrael

Reputation: 862441

You can use:

print (stores_grouped.filter(lambda x: (len(x) == 1) & (x.id == 10).all()))
                                     amount  id
retailer_name store_number year                
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
              1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1947         11:23:19      11  10

Upvotes: 3

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

i'd do it this way:

In [348]: stores_grouped.filter(lambda x: (len(x) == 1)).query('id == 10')
Out[348]:
                                     amount  id
retailer_name store_number year
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
              1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1947         11:23:19      11  10

Upvotes: 1

EdChum
EdChum

Reputation: 393903

Actually as filter expects a scalar bool you can just add the condition in the lambda like a normal if style statement:

In [180]:
stores_grouped.filter(lambda x: (len(x) == 1 and x['id'] == 10))
​
Out[180]:
                                     amount  id
retailer_name store_number year                
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
              1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1947         11:23:19      11  10

Upvotes: 3

Related Questions