Reputation: 47
I have some code that looks like this:
import pandas as pd
import glob
root_directory = r"\\some\shared\directory"
all_files = glob.glob(f'{root_directory}\CPE_*.csv')
li = []
for filename in all_files:
frame = pd.read_csv(filename, index_col=None, header=0, encoding='latin1')
li.append(frame)
df = pd.concat(li, axis=0, ignore_index=True)
This code allows me to concatenate the data and create a master csv file, but I want to add a new column to each dataframe as I loop through them. The file names look something like this: CPE_02082020.csv , CPE_02092020 , etc. So the date is in the file name itself with the format of mmddyyyy. How do I add a date column to each file as I loop through and concatenate them?
Upvotes: 1
Views: 1224
Reputation: 23099
You can assign the column during a for loop, let's simplify a little using a list comprehension and Pathlib
:
import pandas as pd
from pathlib import Path
root_dir = Path('\\some\shared\directory')
all_files = root_dir.glob('*CPE_*.csv')
dfs = [
pd.read_csv(f, index_col=None, header=0, encoding='latin1')\
.assign(date_col=
pd.to_datetime(f.stem.split('_')[1],format='%m%d%Y',errors='coerce'))
for f in all_files
]
print(pd.concat(dfs))
Upvotes: 0
Reputation: 105
To add a date column to each dataframe you can simply add a new column to frame while looping through all_files. This method was suggested by biobirdman on a previous post. You can then use split() to get just the date from the filename.
Try:
for filename in all_files:
frame = pd.read_csv(filename, index_col=None, header=0, encoding='latin1')
frame['date'] = filename.split('_')[1]
li.append(frame)
df = pd.concat(li, axis=0, ignore_index=True)
Then to convert df['date'] to datetime add:
df['date'] = pd.to_datetime(df['date'])
Upvotes: 3