johnnybarrels
johnnybarrels

Reputation: 374

Combining separate daily CSVs in pandas

I have a bunch of CSV files, each one named as the date it was collected, ie. :

2020-03-21.csv
2020-03-22.csv
2020-03-23.csv
etc....

I would like to create a single pandas DataFrame containing the data from all the CSVs, with a new date column listing the date that data is from. As a toy example:

Current, single CSV (eg. 2020-03-19.csv):

Country           X         Y

Australia        10        14
China            24        16
Canada           18        35

Desired result (combined DataFrame):

Country           X         Y         Date

Australia        10        14       2020-03-19
China            24        16       2020-03-19
Canada           18        35       2020-03-19
Australia        14        16       2020-03-20
China            23        17       2020-03-20
Canada           20        38       2020-03-20
Australia        25        19       2020-03-21
etc...

What is the best way to achieve this in pandas? I have tried a couple methods using pd.merge and pd.concat, with no luck.

Upvotes: 3

Views: 265

Answers (2)

Binh
Binh

Reputation: 1173

First, you need to list all the path csv files in your folder:

import glob
csvfiles = []
csvfiles = glob.glob("/path/to/folder/*.csv")
print(csvfiles)

Then you will loop over all these files and concatenate them:

list_df = []
for csvfile in csvfiles:
    #read csv file to df
    df = pd.read_csv(csvfile)
    #get the filename ex: 2020-03-19
    csv_name = csvfile.split('/')[-1].split('.')[0]
    #create a new column with all values are filename ex: 2020-03-19
    df['Date'] = csv_name
    #add df to a list
    list_df.append(df)
#concat all the df in the list
final_df = pd.concat(list_df)

Upvotes: 2

sammywemmy
sammywemmy

Reputation: 28644

This is just a mock up, but should work : it relies on the pathlib module for easier file management :

from pathlib import Path

#initialize path on directory
folder = Path(folder_name)

#no filters done here, since u r sure it a bunch of csv files

combo = (pd.read_csv(f)
         #stem is a pathlib method that extracts the name without the suffix
         #if the pd.to_datetime does not work, then discard it
         #and just assign f.stem to Date
         #u can convert to datetime after
         .assign(Date=pd.to_datetime(f.stem))
         for f in folder.iterdir())

#combine data into one dataframe
everything = pd.concat(combo, ignore_index = True)

Upvotes: 1

Related Questions