Reputation: 269
I have individual csv files within each subfolders of subfolders. From year to months, and within each month folder are day folders, and within each day, is the individual csv. I would like to combine all of the individual csv into one and create a pandas df.
In the tree diagram, it looks like this:
I tried this approach below but nothing was created:
import pandas as pd
import glob
path = r'~/root/up/to/the/folder/2022'
alldata = glob.glob(path + "each*.csv")
alldata.head()
I initially had it just looking for "each*.csv" files but realized there is something missing in between in order to get individual csv within each folder. Then maybe, a for loop will work. like loop through each folder within each subfolder, but that is where I am stucked right now.
The answer to this: Combining separate daily CSVs in pandas shows files that are in the same folder.
I tried to make sense on this answer: batch file to concatenate all csv files in a subfolder for all subfolders, but it just won't click on me.
I also tried the following as suggested in Python importing csv files within subfolders
import os
import pandas as pd
path = '<Insert Path>'
file_extension = '.csv'
csv_file_list = []
for root, dirs, files in os.walk(path):
for name in files:
if name.endswith(file_extension):
file_path = os.path.join(root, name)
csv_file_list.append(file_path)
dfs = [pd.read_csv(f) for f in csv_file_list]
but nothing is showing, I think there is something wrong with the path to redirect as shown in the tree above.
Or maybe there is a following step I need to do because when I ran dfs.head()
it says AttributeError: 'list' object has no attribute 'head'
Upvotes: 2
Views: 1583
Reputation: 1050
The following should work:
from pathlib import Path
import pandas as pd
csv_folder = Path('.') # path to your folder, e.g. to `2022`
df = pd.concat(pd.read_csv(p) for p in csv_folder.glob('**/*.csv'))
Alternatively, if you prefer you can also use glob.glob('**/*.csv', recursive=True)
instead of the Path.glob
method.
Upvotes: 3