Reputation: 1893
I'm working with the NYC MVA dataset. I've combined the CRASH DATE
and CRASH TIME
columns into a single column with format 2017-06-26 22:00:00
. I'd now like to add a categorical column based on seasons. In order to do this, I'm looking to apply a mask to each season name and fill the column based on that, using the following basic structure:
df[df['CRASH TIME'].dt.date < dt.date(:,1,2)]
The problem is that the datetime
date Timestamp requires a year input; the dataset spans a number of years. I would like to select all years, not any given year. In other words, I'd like to just select the month and date, and not the year. Is there a way to do that using datetime
Timestamps?
Upvotes: 1
Views: 48
Reputation: 419
assuming your using pandas for manipulating the data you could do something like this
df['day'] = df['CRASH TIME'].apply(lambda r:r.day)
df['month'] = df['CRASH TIME'].apply(lambda r:r.month)
Then you can combine them or work with them just as they are.
Upvotes: 2
Reputation: 9359
I'm not sure there's a way how to directly compare only a part of date
, but you can extract the month
and day
into a tuple and compare them that way:
month_day_left = (df['CRASH TIME'].dt.date.month, df['CRASH TIME'].dt.date.day)
month_day_right = (dt.date.month, dt.date.day)
(2, 1) < (2, 2) # True
(1, 10) < (2, 1) # True
(2, 1) < (1, 30) # False
you can eventually wrap this comparison into a custom function and use it that way:
df[ is_earlier(df['CRASH TIME'].dt, dt)]
Upvotes: 1