Reputation: 696
I would like to filter for customer_id's
that first appear after a certain date in this case 2019-01-10
and then create a new df with a list of new customers
df
date customer_id
2019-01-01 429492
2019-01-01 344343
2019-01-01 949222
2019-01-10 429492
2019-01-10 344343
2019-01-10 129292
Output df
customer_id
129292
This is what I have tried so far but this gives me also customer_id's that were active before 10th January 2019
s = df.loc[df["date"]>="2019-01-10", "customer_id"]
df_new = df[df["customer_id"].isin(s)]
df_new
Upvotes: 0
Views: 423
Reputation: 31
If your 'date'
column has datetime objects you just have to do:
df_new = df[df['date'] >= datetime(2019, 1, 10)]['customer_id']
If your 'date'
column doesn't contain datetime objects, you should convert it first it by using to_datetime method:
df['date'] = pd.to_datetime(df['date'])
And then apply the methodology described above.
Upvotes: 0
Reputation:
"then create a new df with a list of new customers" so in this case your output is null, because 2019-01-10 is last date, there is no new customers after this date
but if you want to get list of customers after certain date or equal than :
df=pd.DataFrame({
'date':['2019-01-01','2019-01-01','2019-01-01',
'2019-01-10','2019-01-10','2019-01-10'],
'customer_id':[429492,344343,949222,429492,344343,129292]
})
certain_date=pd.to_datetime('2019-01-10')
df.date=pd.to_datetime(df.date)
df=df[
df.date>=certain_date
]
print(df)
date customer_id
3 2019-01-10 429492
4 2019-01-10 344343
5 2019-01-10 129292
Upvotes: 0
Reputation: 862471
You can use boolean indexing with filtering with Series.isin
:
df["date"] = pd.to_datetime(df["date"])
mask1 = df["date"]>="2019-01-10"
mask2 = df["customer_id"].isin(df.loc[~mask1,"customer_id"])
df = df.loc[mask1 & ~mask2, ['customer_id']]
print (df)
customer_id
5 129292
Upvotes: 1
Reputation: 417
df['date'] = pd.to_datetime(df['date'])
cutoff = pd.to_datetime('2019-01-10')
mask = df['date'] >= cutoff
customers_before = df.loc[~mask, 'customer_id'].unique().tolist()
customers_after = df.loc[mask, 'customer_id'].unique().tolist()
result = set(customers_after) - set(customers_before)
Upvotes: 0