Neil
Neil

Reputation: 8247

how to subset pandas dataframe on date

I have a pandas DataFrame like this..

order_id buyer_id item_id                time        
   537       79      93 2016-01-04 10:20:00    
   540      191      93 2016-01-04 10:30:00    
   556      251      82 2016-01-04 13:39:00  
   589      191     104 2016-01-05 10:59:00   
   596      251      99 2016-01-05 13:48:00    
   609       79     106 2016-01-06 10:39:00    
   611      261      97 2016-01-06 10:50:00   
   680       64     135 2016-01-11 11:58:00  
   681      261     133 2016-01-11 12:03:00    
   682      309     135 2016-01-11 12:08:00   

I want to get all the buyer_ids present before 6th jan 2016 but not after 6th Jan 2016

so, it should return me buyer_id 79

I am doing following in Python.

df.buyer_id[(df['time'] < '2016-01-06')]

This returns me all the buyer ids before 6th jan 2016 but how to check for the condition if its not present after 6th jan ? Please help

Upvotes: 0

Views: 64

Answers (2)

Stefan
Stefan

Reputation: 42905

You could use:

df.groupby('buyer_id').apply(lambda x: True if (x.time < '01-06-2016').any() and not (x.time > '01-06-2016').any() else False)

buyer_id
64     False
79     False
191     True
251     True
261    False
309    False
dtype: bool

Upvotes: 1

Anton Protopopov
Anton Protopopov

Reputation: 31692

IIUC you could use isin method to achieve what you want:

df.time = pd.to_datetime(df.time)

In [52]: df
Out[52]:
   order_id  buyer_id  item_id                time
0       537        79       93 2016-01-04 10:20:00
1       540       191       93 2016-01-04 10:30:00
2       556       251       82 2016-01-04 13:39:00
3       589       191      104 2016-01-05 10:59:00
4       596       251       99 2016-01-05 13:48:00
5       609        79      106 2016-01-06 10:39:00
6       611       261       97 2016-01-06 10:50:00
7       680        64      135 2016-01-11 11:58:00
8       681       261      133 2016-01-11 12:03:00
9       682       309      135 2016-01-11 12:08:00


exclude = df.buyer_id[(df['time'] > '2016-01-06')]
select = df.buyer_id[(df['time'] < '2016-01-06')]

In [53]: select
Out[53]:
0     79
1    191
2    251
3    191
4    251
Name: buyer_id, dtype: int64

In [54]: exclude
Out[54]:
5     79
6    261
7     64
8    261
9    309
Name: buyer_id, dtype: int64

In [55]: select[~select.isin(exclude)]
Out[55]:
1    191
2    251
3    191
4    251
Name: buyer_id, dtype: int64

Upvotes: 2

Related Questions