user12625679
user12625679

Reputation: 696

Remove rows based on date column [Pandas]

I would like to filter for customer_id'sthat first appear after a certain date in this case 2019-01-10 and then create a new df with a list of new customers df

date          customer_id

2019-01-01    429492
2019-01-01    344343
2019-01-01    949222
2019-01-10    429492
2019-01-10    344343
2019-01-10    129292

Output df

customer_id
129292

This is what I have tried so far but this gives me also customer_id's that were active before 10th January 2019

s = df.loc[df["date"]>="2019-01-10", "customer_id"]

df_new = df[df["customer_id"].isin(s)]
df_new

Upvotes: 0

Views: 423

Answers (4)

JuanB
JuanB

Reputation: 31

If your 'date' column has datetime objects you just have to do:

df_new = df[df['date'] >= datetime(2019, 1, 10)]['customer_id']

If your 'date' column doesn't contain datetime objects, you should convert it first it by using to_datetime method:

df['date'] = pd.to_datetime(df['date'])

And then apply the methodology described above.

Upvotes: 0

user8560167
user8560167

Reputation:

"then create a new df with a list of new customers" so in this case your output is null, because 2019-01-10 is last date, there is no new customers after this date

but if you want to get list of customers after certain date or equal than :

df=pd.DataFrame({
    'date':['2019-01-01','2019-01-01','2019-01-01',
            '2019-01-10','2019-01-10','2019-01-10'],
    'customer_id':[429492,344343,949222,429492,344343,129292]
})
certain_date=pd.to_datetime('2019-01-10')
df.date=pd.to_datetime(df.date)
df=df[
    df.date>=certain_date
]
print(df)


           date  customer_id
3 2019-01-10       429492
4 2019-01-10       344343
5 2019-01-10       129292

Upvotes: 0

jezrael
jezrael

Reputation: 862471

You can use boolean indexing with filtering with Series.isin:

df["date"] = pd.to_datetime(df["date"])

mask1 = df["date"]>="2019-01-10"
mask2 = df["customer_id"].isin(df.loc[~mask1,"customer_id"])

df = df.loc[mask1 & ~mask2, ['customer_id']]
print (df)
   customer_id
5       129292

Upvotes: 1

Vasil Yordanov
Vasil Yordanov

Reputation: 417

df['date'] = pd.to_datetime(df['date'])

cutoff = pd.to_datetime('2019-01-10')
mask = df['date'] >= cutoff

customers_before = df.loc[~mask, 'customer_id'].unique().tolist()
customers_after = df.loc[mask, 'customer_id'].unique().tolist()

result = set(customers_after) - set(customers_before)

Upvotes: 0

Related Questions