Reputation: 2117
I am pulling a chunk of data within a range of time. It is pulling date and times from column recvd_dttm. It takes all the data starting from a year ago. I want to modify it so that it can pull a month or a day, but pd.DateOffset(months=1) is giving a KeyError:1
error. I get the same error if I change it to days=7. But it works just fine with years=1. What is going on here?
df = pd.read_csv('MYDATA.csv')
# filter by countries with at least one medal and sort
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])
#Only retrieve data before now (ignore typos that are future dates)
mask = df['recvd_dttm'] <= datetime.datetime.now()
df = df.loc[mask]
# get first and last datetime for final week of data
range_max = df['recvd_dttm'].max()
range_min = range_max - pd.DateOffset(years=1)
# take slice with final week of data
df = df[(df['recvd_dttm'] >= range_min) &
(df['recvd_dttm'] <= range_max)]
EDIT: The problem was coming from elsewhere in the code!
Upvotes: 1
Views: 2612
Reputation: 24742
You can use the offset family from pd.tseries.offsets
. Below is the sample code.
import pandas as pd
import datetime
# your data
# ================================
df = pd.read_csv('/home/Jian/Downloads/MOCK_DATA.csv', usecols=[1, 4])
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])
mask = df['recvd_dttm'] <= datetime.datetime.now()
df = df.loc[mask]
# flexible offsets
# =======================================
print(range_max)
2015-07-14 16:52:58
# for 1 month: currently there is a bug
# range_min_month = range_max - pd.tseries.offsets.MonthOffset(1)
# for 1 week
range_min_week = range_max - pd.tseries.offsets.Week(1)
print(range_min_week)
2015-07-07 16:52:58
# for 5 days
range_min_day = range_max - pd.tseries.offsets.Day(5)
print(range_min_day)
2015-07-09 16:52:58
Upvotes: 1
Reputation: 545
Have you consider using Unix Epoch Time instead of a date formatted in a lesser manner? There is a well documented answer for converting to Unix Time, and dealing with the sort of offset in the question seems like it would be a lot easier as sliding ranges are simpler to implement with a more or less continuous sequence of real numeric values.
Upvotes: 0
Reputation: 14415
Have you tried being more explicit with what pd.DateOffset is acting on?
For example:
range_max = df['recvd_dttm'].max()
range_min = range_max - (df['recvd_dttm']+pd.DateOffset(years=1))
Then substitute month and days values.
Upvotes: 1