yongsheng
yongsheng

Reputation: 396

Conditional date filter

I am trying to split a series of timestamps into groups:

Define variables:

Very old = Date < '20190101'
Current = Today's date as %Y-%m (Year-Month)

Conditions

1. timestamp < very old
2. Very old < timestamp < current
3. timestamp = current
4. timestamp > current

Series that had been split from original DataFrame:

timestamp_dict = \
{0: Timestamp('2019-05-01 00:00:00'),
 1: Timestamp('2019-05-01 00:00:00'),
 2: Timestamp('2018-12-01 00:00:00'),
 3: Timestamp('2019-05-01 00:00:00'),
 4: Timestamp('2019-05-01 00:00:00'),
 5: Timestamp('2019-05-01 00:00:00'),
 6: Timestamp('2019-04-01 00:00:00'),
 7: Timestamp('2019-08-01 00:00:00')}

The datetimes are stored as datetime64[ns]. I have a feeling converting current timestamp to str is wrong, however, I am not sure how to extract the current timestamp as format %Y-%m.

I have an idea on accessing current date as month, year integers and then concatenating but then I would probably run into zero padding issues:

_month = dt.datetime.today().month
_year = dt.datetime.today().year

# Would run into zero padding for months 1-9:
current = str(_year) + str(_month)  

Here, I try to generate a new DataFrame column using np.select and specifying my required conditions.

import datetime as dt

current = dt.datetime.today().strftime('%Y-%m')
veryold = '20190101'

conditions = [
    df.Delivery < veryold,
    (df.Delivery >= veryold | (df.Delivery < current),
    df.Delivery == current,
    df.Delivery > current
]

outcome = [
    'Very old',
    'Old',
    'Current',
    'Future'
]

df['New'] = np.select(conditions, outcome)

df.New

My expected output is to have an additional column of labeled outcomes in my DataFrame.

Upvotes: 1

Views: 107

Answers (1)

jezrael
jezrael

Reputation: 862511

Idea is create month periods by Series.dt.to_period for possible compare by YYYY-MM:

current = pd.Timestamp(pd.datetime.today()).to_period('M')
veryold = pd.Timestamp('20190101')

conditions = [
    df.Delivery < veryold,
    (df.Delivery >= veryold) | (df.Delivery.dt.to_period('M') < current),
    df.Delivery.dt.to_period('M') == current,
    df.Delivery.dt.to_period('M') > current]

outcome = [
    'Very old',
    'Old',
    'Current',
    'Future'
]

df = pd.Series(pd.Timestamp_dict).to_frame('Delivery')
df['New'] = np.select(conditions, outcome)
print(df)
    Delivery       New
0 2019-05-01       Old
1 2019-05-01       Old
2 2018-12-01  Very old
3 2019-05-01       Old
4 2019-05-01       Old
5 2019-05-01       Old
6 2019-04-01       Old
7 2019-08-01       Old

Upvotes: 1

Related Questions