Stacey
Stacey

Reputation: 5097

Filtering data by date in a dataframe

I have some code below that imports some time series csv files into a dataframe and changes the column names of the dataframe to 'date' for the column with the dates of the time series and the other columns are set to the names of the files they have come from. All good so far. Now I want to read in data that is between two pre set dates. This is where I am having a problem. I cannot get the code to only return the dataframe from startDate to endDate and remove the other rows of data.

I have had various goes at this but I cannot get the filter to work. Please see the current version of my code below:

def getTimeseriesData4(DataPath,columnNum,startDate,endDate):
    colNames = ['date']

    path = DataPath
    filePath = path, "*.csv"
    allfiles = glob.glob(os.path.join(path, "*.csv"))
    for fname in allfiles:
        name = os.path.splitext(fname)[0]
        name = os.path.split(name)[1]

        colNames.append(name)

    dataframes = [pd.read_csv(fname, header=None,usecols=[0,columnNum]) for fname in allfiles]



    #this is the part where I am trying to filter out the data I do not need.  So dataframes would only have data between the startDate and the endDate
    dataframes = dataframes.set_index(['date'])
    print(dataframes.loc[startDate:endDate])



    timeseriesData = reduce(partial(pd.merge, on=0, how='outer'), dataframes)
    timeseriesData.columns=colNames

    return timeseriesData  

below is a sample of the data I am importing

          date  BBG.BBG.AUDEUR.FX  BBG.BBG.CADEUR.FX  BBG.BBG.CHFEUR.FX  \
0   01/01/2001             0.5932             0.7084             0.6588   
1   02/01/2001             0.5893             0.7038             0.6576   
2   03/01/2001             0.6000             0.7199             0.6610   
3   04/01/2001             0.5972             0.7021             0.6563   
4   05/01/2001             0.5973             0.6972             0.6532   
5   08/01/2001             0.5987             0.7073             0.6562   
6   09/01/2001             0.5972             0.7095             0.6565   
7   10/01/2001             0.5923             0.7105             0.6548   
8   11/01/2001             0.5888             0.7029             0.6512   
9   12/01/2001             0.5861             0.7013             0.6494   
10  15/01/2001             0.5870             0.7064             0.6492   
11  16/01/2001             0.5892             0.7047             0.6497   
12  17/01/2001             0.5912             0.7070             0.6507   
13  18/01/2001             0.5920             0.7015             0.6544   
14  19/01/2001             0.5953             0.7083             0.6535 

so if I set the startDate to be '02/01/2001' and the endDate to be '05/01/2001'

the code would return:

          date  BBG.BBG.AUDEUR.FX  BBG.BBG.CADEUR.FX  BBG.BBG.CHFEUR.FX  \
0   02/01/2001             0.5893             0.7038             0.6576   
1   03/01/2001             0.6000             0.7199             0.6610   
2   04/01/2001             0.5972             0.7021             0.6563   
3   05/01/2001             0.5973             0.6972             0.6532 

So instead of returning all the data imported from the CSV files, the code would return data between the startDate and the endDate.

Upvotes: 2

Views: 2977

Answers (1)

EdChum
EdChum

Reputation: 394041

Convert the dtype to datetime using pd.to_datetime:

In [98]:
df['date'] = pd.to_datetime(df['date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 15 entries, 0 to 14
Data columns (total 4 columns):
date                 15 non-null datetime64[ns]
BBG.BBG.AUDEUR.FX    15 non-null float64
BBG.BBG.CADEUR.FX    15 non-null float64
BBG.BBG.CHFEUR.FX    15 non-null float64
dtypes: datetime64[ns](1), float64(3)
memory usage: 600.0 bytes

You can then pass your dates as filtering criteria to create a boolean mask:

In [97]:
df[(df['date'] >= '02/01/2001') & (df['date'] <= '05/01/2001')]

Out[97]:
        date  BBG.BBG.AUDEUR.FX  BBG.BBG.CADEUR.FX  BBG.BBG.CHFEUR.FX
1 2001-02-01             0.5893             0.7038             0.6576
2 2001-03-01             0.6000             0.7199             0.6610
3 2001-04-01             0.5972             0.7021             0.6563
4 2001-05-01             0.5973             0.6972             0.6532

Upvotes: 3

Related Questions