Filtering data by date in a dataframe

Question

I have some code below that imports some time series csv files into a dataframe and changes the column names of the dataframe to 'date' for the column with the dates of the time series and the other columns are set to the names of the files they have come from. All good so far. Now I want to read in data that is between two pre set dates. This is where I am having a problem. I cannot get the code to only return the dataframe from startDate to endDate and remove the other rows of data.

I have had various goes at this but I cannot get the filter to work. Please see the current version of my code below:

def getTimeseriesData4(DataPath,columnNum,startDate,endDate):
    colNames = ['date']

    path = DataPath
    filePath = path, "*.csv"
    allfiles = glob.glob(os.path.join(path, "*.csv"))
    for fname in allfiles:
        name = os.path.splitext(fname)[0]
        name = os.path.split(name)[1]

        colNames.append(name)

    dataframes = [pd.read_csv(fname, header=None,usecols=[0,columnNum]) for fname in allfiles]



    #this is the part where I am trying to filter out the data I do not need.  So dataframes would only have data between the startDate and the endDate
    dataframes = dataframes.set_index(['date'])
    print(dataframes.loc[startDate:endDate])



    timeseriesData = reduce(partial(pd.merge, on=0, how='outer'), dataframes)
    timeseriesData.columns=colNames

    return timeseriesData

below is a sample of the data I am importing

          date  BBG.BBG.AUDEUR.FX  BBG.BBG.CADEUR.FX  BBG.BBG.CHFEUR.FX  \
0   01/01/2001             0.5932             0.7084             0.6588   
1   02/01/2001             0.5893             0.7038             0.6576   
2   03/01/2001             0.6000             0.7199             0.6610   
3   04/01/2001             0.5972             0.7021             0.6563   
4   05/01/2001             0.5973             0.6972             0.6532   
5   08/01/2001             0.5987             0.7073             0.6562   
6   09/01/2001             0.5972             0.7095             0.6565   
7   10/01/2001             0.5923             0.7105             0.6548   
8   11/01/2001             0.5888             0.7029             0.6512   
9   12/01/2001             0.5861             0.7013             0.6494   
10  15/01/2001             0.5870             0.7064             0.6492   
11  16/01/2001             0.5892             0.7047             0.6497   
12  17/01/2001             0.5912             0.7070             0.6507   
13  18/01/2001             0.5920             0.7015             0.6544   
14  19/01/2001             0.5953             0.7083             0.6535

so if I set the startDate to be '02/01/2001' and the endDate to be '05/01/2001'

the code would return:

          date  BBG.BBG.AUDEUR.FX  BBG.BBG.CADEUR.FX  BBG.BBG.CHFEUR.FX  \
0   02/01/2001             0.5893             0.7038             0.6576   
1   03/01/2001             0.6000             0.7199             0.6610   
2   04/01/2001             0.5972             0.7021             0.6563   
3   05/01/2001             0.5973             0.6972             0.6532

So instead of returning all the data imported from the CSV files, the code would return data between the startDate and the endDate.

EdChum · Accepted Answer

Convert the dtype to datetime using pd.to_datetime:

In [98]:
df['date'] = pd.to_datetime(df['date'])
df.info()


Int64Index: 15 entries, 0 to 14
Data columns (total 4 columns):
date                 15 non-null datetime64[ns]
BBG.BBG.AUDEUR.FX    15 non-null float64
BBG.BBG.CADEUR.FX    15 non-null float64
BBG.BBG.CHFEUR.FX    15 non-null float64
dtypes: datetime64[ns](1), float64(3)
memory usage: 600.0 bytes

You can then pass your dates as filtering criteria to create a boolean mask:

In [97]:
df[(df['date'] >= '02/01/2001') & (df['date'] <= '05/01/2001')]

Out[97]:
        date  BBG.BBG.AUDEUR.FX  BBG.BBG.CADEUR.FX  BBG.BBG.CHFEUR.FX
1 2001-02-01             0.5893             0.7038             0.6576
2 2001-03-01             0.6000             0.7199             0.6610
3 2001-04-01             0.5972             0.7021             0.6563
4 2001-05-01             0.5973             0.6972             0.6532

Filtering data by date in a dataframe

Answers (1)

Related Questions