Reputation: 374
I have a bunch of CSV files, each one named as the date it was collected, ie. :
2020-03-21.csv
2020-03-22.csv
2020-03-23.csv
etc....
I would like to create a single pandas DataFrame containing the data from all the CSVs, with a new date column listing the date that data is from. As a toy example:
Current, single CSV (eg. 2020-03-19.csv
):
Country X Y
Australia 10 14
China 24 16
Canada 18 35
Desired result (combined DataFrame):
Country X Y Date
Australia 10 14 2020-03-19
China 24 16 2020-03-19
Canada 18 35 2020-03-19
Australia 14 16 2020-03-20
China 23 17 2020-03-20
Canada 20 38 2020-03-20
Australia 25 19 2020-03-21
etc...
What is the best way to achieve this in pandas? I have tried a couple methods using pd.merge
and pd.concat
, with no luck.
Upvotes: 3
Views: 265
Reputation: 1173
First, you need to list all the path csv files in your folder:
import glob
csvfiles = []
csvfiles = glob.glob("/path/to/folder/*.csv")
print(csvfiles)
Then you will loop over all these files and concatenate them:
list_df = []
for csvfile in csvfiles:
#read csv file to df
df = pd.read_csv(csvfile)
#get the filename ex: 2020-03-19
csv_name = csvfile.split('/')[-1].split('.')[0]
#create a new column with all values are filename ex: 2020-03-19
df['Date'] = csv_name
#add df to a list
list_df.append(df)
#concat all the df in the list
final_df = pd.concat(list_df)
Upvotes: 2
Reputation: 28644
This is just a mock up, but should work : it relies on the pathlib module for easier file management :
from pathlib import Path
#initialize path on directory
folder = Path(folder_name)
#no filters done here, since u r sure it a bunch of csv files
combo = (pd.read_csv(f)
#stem is a pathlib method that extracts the name without the suffix
#if the pd.to_datetime does not work, then discard it
#and just assign f.stem to Date
#u can convert to datetime after
.assign(Date=pd.to_datetime(f.stem))
for f in folder.iterdir())
#combine data into one dataframe
everything = pd.concat(combo, ignore_index = True)
Upvotes: 1