Satya
Satya

Reputation: 5907

how to slice a dataframe having date field as index?

In my dataframe I change my index to a date field as

df.index = df.TRX_DATE   # transaction date and type is class pandas.core.series.Series'

Now I want to slice my dataframe on base of two dates or by any date difference.

But i am getting errors.

# currentdate is today date
startdate = currentdate - timedelta(days=30)

dflast30 = df.loc[startdate:currentdate]  # error

Tried doing by creating mask

mask = (df['TRX_DATE'] >= startdate) & (df['TRX_DATE'] <= currentdate )
dflast30 = df.loc[mask]

dflast30 = df.loc[mask]

TypeError: unorderable types: str() > datetime.datetime()

Then I tried doing truncating like:

dflast30 = df.truncate(before = currentdate, after = startdate)

And I get the same error.

I am confused. And I need an advise on these points:

  1. Can I change the index(TRX_DATE field) to datetime type?

  2. Or I should make that string field type.

  3. Or I should let the unassigned index as it was and do the search on date field for my current requirement.

  4. Or give a example how can i make a datefield as index and slice for a date-range and please mention the output also.

Upvotes: 4

Views: 1461

Answers (1)

jezrael
jezrael

Reputation: 862681

I think your first approach is fine.

If you want copy column TRX_DATE to index:

df.index = pd.to_datetime(df['TRX_DATE'])

If you don't want copy, only set column TRX_DATE to index:

df = df.set_index(['TRX_DATE'])

There is my demo:

import pandas as pd
import numpy as np
import io
import datetime as dt

temp=u"""TRX_DATE;A
2013-07-05;1
2013-08-06;1
2015-09-05;2
2015-10-08;2
2015-11-05;2
2015-11-25;2
2015-12-06;3"""

df = pd.read_csv(io.StringIO(temp), sep=";", parse_dates=[0])
print df
#    TRX_DATE  A
#0 2013-07-05  1
#1 2013-08-06  1
#2 2015-09-05  2
#3 2015-10-08  2
#4 2015-11-05  2
#5 2015-11-25  2
#6 2015-12-06  3

print df.dtypes
#TRX_DATE    datetime64[ns]
#A                    int64
#dtype: object

#copy column TRX_DATE to index
#df.index = pd.to_datetime(df['TRX_DATE'])
#no copy, only set column TRX_DATE to index
df = df.set_index(['TRX_DATE'])
print df
#            A
#TRX_DATE
#2013-07-05  1
#2013-08-06  1
#2015-09-05  2
#2015-10-08  2
#2015-11-05  2
#2015-11-25  2
#2015-12-06  3

currentdate = dt.date.today()
print currentdate
#2015-11-06

startdate = currentdate - pd.Timedelta(days=30)
print startdate
#2015-10-07

dflast30 = df.loc[startdate:currentdate]
print dflast30
#            A
#TRX_DATE
#2015-10-08  2
#2015-11-05  2

dflast30 = dflast30.reset_index()
print dflast30
#    TRX_DATE  A
#0 2015-10-08  2
#1 2015-11-05  2

Different approach, where you can create subset of your df. No need set datetimeindex.

import pandas as pd
import numpy as np
import io
import datetime as dt

temp=u"""TRX_DATE;A
2013-07-05;1
2013-08-06;1
2015-09-05;2
2015-10-08;2
2015-11-05;2
2015-11-25;2
2015-12-06;3"""

df = pd.read_csv(io.StringIO(temp), sep=";", parse_dates=[0])
print df
#    TRX_DATE  A
#0 2013-07-05  1
#1 2013-08-06  1
#2 2015-09-05  2
#3 2015-10-08  2
#4 2015-11-05  2
#5 2015-11-25  2
#6 2015-12-06  3

print df.dtypes
#TRX_DATE    datetime64[ns]
#A                    int64
#dtype: object

currentdate = dt.date.today()
print currentdate
#2015-11-06

startdate = currentdate - pd.Timedelta(days=30)
print startdate
#2015-10-07

dflast30 = df[(df.TRX_DATE >= startdate) & (df.TRX_DATE <= currentdate)]
print dflast30
#    TRX_DATE  A
#3 2015-10-08  2
#4 2015-11-05  2

Upvotes: 2

Related Questions