Looping through csv files to create concatenated dataframe. Need to use date in file name to create new column for each file

Question

I have some code that looks like this:

import pandas as pd
import glob

root_directory = r"\some\shared\directory"

all_files = glob.glob(f'{root_directory}\CPE_*.csv')

li = []

for filename in all_files:
    frame = pd.read_csv(filename, index_col=None, header=0, encoding='latin1')
    li.append(frame)

df = pd.concat(li, axis=0, ignore_index=True)

This code allows me to concatenate the data and create a master csv file, but I want to add a new column to each dataframe as I loop through them. The file names look something like this: CPE_02082020.csv , CPE_02092020 , etc. So the date is in the file name itself with the format of mmddyyyy. How do I add a date column to each file as I loop through and concatenate them?

Blake · Accepted Answer

To add a date column to each dataframe you can simply add a new column to frame while looping through all_files. This method was suggested by biobirdman on a previous post. You can then use split() to get just the date from the filename.

Try:

for filename in all_files:
    frame = pd.read_csv(filename, index_col=None, header=0, encoding='latin1')
    frame['date'] = filename.split('_')[1]
    li.append(frame)

df = pd.concat(li, axis=0, ignore_index=True)

Then to convert df['date'] to datetime add:

df['date'] = pd.to_datetime(df['date'])

Looping through csv files to create concatenated dataframe. Need to use date in file name to create new column for each file

Answers (2)

Related Questions