Reputation: 5907
In my dataframe
I change my index to a date field as
df.index = df.TRX_DATE # transaction date and type is class pandas.core.series.Series'
Now I want to slice my dataframe
on base of two dates or by any date difference.
But i am getting errors.
# currentdate is today date
startdate = currentdate - timedelta(days=30)
dflast30 = df.loc[startdate:currentdate] # error
Tried doing by creating mask
mask = (df['TRX_DATE'] >= startdate) & (df['TRX_DATE'] <= currentdate )
dflast30 = df.loc[mask]
dflast30 = df.loc[mask]
TypeError: unorderable types: str() > datetime.datetime()
Then I tried doing truncating like:
dflast30 = df.truncate(before = currentdate, after = startdate)
And I get the same error.
I am confused. And I need an advise on these points:
Can I change the index(TRX_DATE field) to datetime type?
Or I should make that string field type.
Or I should let the unassigned index as it was and do the search on date field for my current requirement.
Or give a example how can i make a datefield as index and slice for a date-range and please mention the output also.
Upvotes: 4
Views: 1461
Reputation: 862681
I think your first approach is fine.
If you want copy column TRX_DATE
to index:
df.index = pd.to_datetime(df['TRX_DATE'])
If you don't want copy, only set column TRX_DATE
to index:
df = df.set_index(['TRX_DATE'])
There is my demo:
import pandas as pd
import numpy as np
import io
import datetime as dt
temp=u"""TRX_DATE;A
2013-07-05;1
2013-08-06;1
2015-09-05;2
2015-10-08;2
2015-11-05;2
2015-11-25;2
2015-12-06;3"""
df = pd.read_csv(io.StringIO(temp), sep=";", parse_dates=[0])
print df
# TRX_DATE A
#0 2013-07-05 1
#1 2013-08-06 1
#2 2015-09-05 2
#3 2015-10-08 2
#4 2015-11-05 2
#5 2015-11-25 2
#6 2015-12-06 3
print df.dtypes
#TRX_DATE datetime64[ns]
#A int64
#dtype: object
#copy column TRX_DATE to index
#df.index = pd.to_datetime(df['TRX_DATE'])
#no copy, only set column TRX_DATE to index
df = df.set_index(['TRX_DATE'])
print df
# A
#TRX_DATE
#2013-07-05 1
#2013-08-06 1
#2015-09-05 2
#2015-10-08 2
#2015-11-05 2
#2015-11-25 2
#2015-12-06 3
currentdate = dt.date.today()
print currentdate
#2015-11-06
startdate = currentdate - pd.Timedelta(days=30)
print startdate
#2015-10-07
dflast30 = df.loc[startdate:currentdate]
print dflast30
# A
#TRX_DATE
#2015-10-08 2
#2015-11-05 2
dflast30 = dflast30.reset_index()
print dflast30
# TRX_DATE A
#0 2015-10-08 2
#1 2015-11-05 2
Different approach, where you can create subset of your df
. No need set datetimeindex
.
import pandas as pd
import numpy as np
import io
import datetime as dt
temp=u"""TRX_DATE;A
2013-07-05;1
2013-08-06;1
2015-09-05;2
2015-10-08;2
2015-11-05;2
2015-11-25;2
2015-12-06;3"""
df = pd.read_csv(io.StringIO(temp), sep=";", parse_dates=[0])
print df
# TRX_DATE A
#0 2013-07-05 1
#1 2013-08-06 1
#2 2015-09-05 2
#3 2015-10-08 2
#4 2015-11-05 2
#5 2015-11-25 2
#6 2015-12-06 3
print df.dtypes
#TRX_DATE datetime64[ns]
#A int64
#dtype: object
currentdate = dt.date.today()
print currentdate
#2015-11-06
startdate = currentdate - pd.Timedelta(days=30)
print startdate
#2015-10-07
dflast30 = df[(df.TRX_DATE >= startdate) & (df.TRX_DATE <= currentdate)]
print dflast30
# TRX_DATE A
#3 2015-10-08 2
#4 2015-11-05 2
Upvotes: 2