Combining separate daily CSVs in pandas

Question

I have a bunch of CSV files, each one named as the date it was collected, ie. :

2020-03-21.csv
2020-03-22.csv
2020-03-23.csv
etc....

I would like to create a single pandas DataFrame containing the data from all the CSVs, with a new date column listing the date that data is from. As a toy example:

Current, single CSV (eg. 2020-03-19.csv):

Country           X         Y

Australia        10        14
China            24        16
Canada           18        35

Desired result (combined DataFrame):

Country           X         Y         Date

Australia        10        14       2020-03-19
China            24        16       2020-03-19
Canada           18        35       2020-03-19
Australia        14        16       2020-03-20
China            23        17       2020-03-20
Canada           20        38       2020-03-20
Australia        25        19       2020-03-21
etc...

What is the best way to achieve this in pandas? I have tried a couple methods using pd.merge and pd.concat, with no luck.

Binh · Accepted Answer

First, you need to list all the path csv files in your folder:

import glob
csvfiles = []
csvfiles = glob.glob("/path/to/folder/*.csv")
print(csvfiles)

Then you will loop over all these files and concatenate them:

list_df = []
for csvfile in csvfiles:
    #read csv file to df
    df = pd.read_csv(csvfile)
    #get the filename ex: 2020-03-19
    csv_name = csvfile.split('/')[-1].split('.')[0]
    #create a new column with all values are filename ex: 2020-03-19
    df['Date'] = csv_name
    #add df to a list
    list_df.append(df)
#concat all the df in the list
final_df = pd.concat(list_df)

Combining separate daily CSVs in pandas

Answers (2)

Related Questions