Reputation: 12781
I am trying to interpret a field as a date, change the date to represent the month the date appears in, offset the date by a month and then represent it as a date without a timestamps. I have ended up with this which looks and feels too unwieldy:
df['DATE'].apply( lambda d: pd.to_datetime(pd.to_datetime(d).to_period('M').to_timestamp('M')\
- np.timedelta64(1,'M')).date())
The timestamps are strings in this format:
2012-09-01 00:00:00
Any ideas for a better way? Thanks.
Upvotes: 5
Views: 5680
Reputation: 13757
Well, you can avoid the apply and do it vectorized (I think that makes it a bit nicer):
print df
date x1
0 2010-01-01 00:00:00 10
1 2010-02-01 00:00:00 10
2 2010-03-01 00:00:00 10
3 2010-04-01 00:00:00 10
4 2010-04-01 00:00:00 5
5 2010-05-01 00:00:00 5
df['date'] = (pd.to_datetime(df['date']).values.astype('datetime64[M]')
- np.timedelta64(1,'M'))
print df
date x1
0 2009-12-01 10
1 2010-01-01 10
2 2010-02-01 10
3 2010-03-01 10
4 2010-03-01 5
5 2010-04-01 5
Of course, the dates will still be datetime64[ns]
since pandas always converts to that.
Edit: Suppose you wanted the end of the previous month instead of the beggining of the previous month:
df['date'] = (pd.to_datetime(df['date']).values.astype('datetime64[M]')
- np.timedelta64(1,'D'))
print df
date x1
0 2009-11-30 10
1 2009-12-31 10
2 2010-01-31 10
3 2010-02-28 10
4 2010-02-28 5
5 2010-03-31 5
Edit: Jeff points out that a more pandonic way is to make date a DatetimeIndex
and use a Date Offset. So something like:
df['date'] = pd.Index(df['date']).to_datetime() - pd.offsets.MonthBegin(1)
print df
date x1
0 2009-12-01 10
1 2010-01-01 10
2 2010-02-01 10
3 2010-03-01 10
4 2010-03-01 5
5 2010-04-01 5
Or month-ends:
df['date'] = pd.Index(df['date']).to_datetime() - pd.offsets.MonthEnd(1)
print df
date x1
0 2009-12-31 10
1 2010-01-31 10
2 2010-02-28 10
3 2010-03-31 10
4 2010-03-31 5
5 2010-04-30 5
Upvotes: 9