Reputation: 105
I have a directory ../customer_data/*
with 15 folders. Each folder is a unique customer.
Example: ../customer_data/customer_1
Within each customer folder there is a csv called surveys.csv
.
GOAL: I want to iterate through all the folders in ../customer_data/*
and find the surveys.csv
for each unique customer and create a concatenated dataframe. I also want to add a column in the dataframe where it has the customer id which is the name of the folder.
import glob
import os
rootdir = '../customer_data/*'
dataframes = []
for subdir, dirs, files in os.walk(rootdir):
for file in files:
csvfiles = glob.glob(os.path.join(rootdir, 'surveys.csv'))
# loop through the files and read them in with pandas
# a list to hold all the individual pandas DataFrames
df = pd.read_csv(csvfiles)
df['customer_id'] = os.path.dirname
dataframes.append(df)
# concatenate them all together
result = pd.concat(dataframes, ignore_index=True)
result.head()
This code is not giving me all 15 files. Please help
Upvotes: 0
Views: 1859
Reputation: 23099
Let's try pathlib with rglob
which will recursively search your directory structure for all files that match a glob
pattern. in this instance survey.
import pandas as pd
from pathlib import Path
root_dir = Path('/top_level_dir/')
files = {file.parent.parts[-1] : file for file in Path.rglob('*survey.csv')}
df = pd.concat([pd.read_csv(file).assign(customer=name) for name,file in files.items()])
Note you'll need Python 3.4+ for pathlib.
Upvotes: 0
Reputation: 19250
You can use the pathlib
module for this.
from pathlib import Path
import pandas as pd
dfs = []
for filepath in Path("customer_data").glob("customer_*/surveys.csv"):
this_df = pd.read_csv(filepath)
# Set the customer ID as the name of the parent directory.
this_df.loc[:, "customer_id"] = filepath.parent.name
dfs.append(this_df)
df = pd.concat(dfs)
Upvotes: 1