jenryb
jenryb

Reputation: 2117

DateOffset from Year to Month and Week

I am pulling a chunk of data within a range of time. It is pulling date and times from column recvd_dttm. It takes all the data starting from a year ago. I want to modify it so that it can pull a month or a day, but pd.DateOffset(months=1) is giving a KeyError:1 error. I get the same error if I change it to days=7. But it works just fine with years=1. What is going on here?

df = pd.read_csv('MYDATA.csv')

# filter by countries with at least one medal and sort
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])

#Only retrieve data before now (ignore typos that are future dates)
mask = df['recvd_dttm'] <= datetime.datetime.now()
df = df.loc[mask]
# get first and last datetime for final week of data

range_max = df['recvd_dttm'].max()
range_min = range_max - pd.DateOffset(years=1)

# take slice with final week of data
df = df[(df['recvd_dttm'] >= range_min) & 
               (df['recvd_dttm'] <= range_max)]

EDIT: The problem was coming from elsewhere in the code!

Upvotes: 1

Views: 2612

Answers (4)

Jean-Michel Provencher
Jean-Michel Provencher

Reputation: 103

Try using timedelta instead of DateOffset

Upvotes: 0

Jianxun Li
Jianxun Li

Reputation: 24742

You can use the offset family from pd.tseries.offsets. Below is the sample code.

import pandas as pd
import datetime

# your data
# ================================
df = pd.read_csv('/home/Jian/Downloads/MOCK_DATA.csv', usecols=[1, 4])

df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])
mask = df['recvd_dttm'] <= datetime.datetime.now()
df = df.loc[mask]


# flexible offsets
# =======================================
print(range_max)

2015-07-14 16:52:58

# for 1 month: currently there is a bug
# range_min_month = range_max - pd.tseries.offsets.MonthOffset(1)

# for 1 week
range_min_week = range_max - pd.tseries.offsets.Week(1)
print(range_min_week)

2015-07-07 16:52:58

# for 5 days
range_min_day = range_max - pd.tseries.offsets.Day(5)
print(range_min_day)

2015-07-09 16:52:58

Upvotes: 1

Matt
Matt

Reputation: 545

Have you consider using Unix Epoch Time instead of a date formatted in a lesser manner? There is a well documented answer for converting to Unix Time, and dealing with the sort of offset in the question seems like it would be a lot easier as sliding ranges are simpler to implement with a more or less continuous sequence of real numeric values.

Upvotes: 0

AZhao
AZhao

Reputation: 14415

Have you tried being more explicit with what pd.DateOffset is acting on?

For example:

range_max = df['recvd_dttm'].max()
range_min = range_max - (df['recvd_dttm']+pd.DateOffset(years=1))

Then substitute month and days values.

Upvotes: 1

Related Questions