Reputation: 396
I am trying to split a series of timestamps into groups:
Define variables:
Very old = Date < '20190101'
Current = Today's date as %Y-%m (Year-Month)
Conditions
1. timestamp < very old
2. Very old < timestamp < current
3. timestamp = current
4. timestamp > current
Series that had been split from original DataFrame:
timestamp_dict = \
{0: Timestamp('2019-05-01 00:00:00'),
1: Timestamp('2019-05-01 00:00:00'),
2: Timestamp('2018-12-01 00:00:00'),
3: Timestamp('2019-05-01 00:00:00'),
4: Timestamp('2019-05-01 00:00:00'),
5: Timestamp('2019-05-01 00:00:00'),
6: Timestamp('2019-04-01 00:00:00'),
7: Timestamp('2019-08-01 00:00:00')}
The datetimes are stored as datetime64[ns].
I have a feeling converting current timestamp to str is wrong, however, I am not sure how to extract the current timestamp as format %Y-%m
.
I have an idea on accessing current date as month, year integers and then concatenating but then I would probably run into zero padding issues:
_month = dt.datetime.today().month
_year = dt.datetime.today().year
# Would run into zero padding for months 1-9:
current = str(_year) + str(_month)
Here, I try to generate a new DataFrame column using np.select and specifying my required conditions.
import datetime as dt
current = dt.datetime.today().strftime('%Y-%m')
veryold = '20190101'
conditions = [
df.Delivery < veryold,
(df.Delivery >= veryold | (df.Delivery < current),
df.Delivery == current,
df.Delivery > current
]
outcome = [
'Very old',
'Old',
'Current',
'Future'
]
df['New'] = np.select(conditions, outcome)
df.New
My expected output is to have an additional column of labeled outcomes in my DataFrame.
Upvotes: 1
Views: 107
Reputation: 862511
Idea is create month periods by Series.dt.to_period
for possible compare by YYYY-MM
:
current = pd.Timestamp(pd.datetime.today()).to_period('M')
veryold = pd.Timestamp('20190101')
conditions = [
df.Delivery < veryold,
(df.Delivery >= veryold) | (df.Delivery.dt.to_period('M') < current),
df.Delivery.dt.to_period('M') == current,
df.Delivery.dt.to_period('M') > current]
outcome = [
'Very old',
'Old',
'Current',
'Future'
]
df = pd.Series(pd.Timestamp_dict).to_frame('Delivery')
df['New'] = np.select(conditions, outcome)
print(df)
Delivery New
0 2019-05-01 Old
1 2019-05-01 Old
2 2018-12-01 Very old
3 2019-05-01 Old
4 2019-05-01 Old
5 2019-05-01 Old
6 2019-04-01 Old
7 2019-08-01 Old
Upvotes: 1