doubleD
doubleD

Reputation: 269

combine all csv in various subfolders in pandas or in powershell/terminal and create a pandas dataframe

I have individual csv files within each subfolders of subfolders. From year to months, and within each month folder are day folders, and within each day, is the individual csv. I would like to combine all of the individual csv into one and create a pandas df.

In the tree diagram, it looks like this:

enter image description here

I tried this approach below but nothing was created:


import pandas as pd
import glob

path = r'~/root/up/to/the/folder/2022'
alldata = glob.glob(path + "each*.csv")
alldata.head()

I initially had it just looking for "each*.csv" files but realized there is something missing in between in order to get individual csv within each folder. Then maybe, a for loop will work. like loop through each folder within each subfolder, but that is where I am stucked right now.

The answer to this: Combining separate daily CSVs in pandas shows files that are in the same folder.

I tried to make sense on this answer: batch file to concatenate all csv files in a subfolder for all subfolders, but it just won't click on me.

I also tried the following as suggested in Python importing csv files within subfolders

import os
import pandas as pd

path = '<Insert Path>'
file_extension = '.csv'
csv_file_list = []
for root, dirs, files in os.walk(path):
    for name in files:
        if name.endswith(file_extension):
            file_path = os.path.join(root, name)
            csv_file_list.append(file_path)

dfs = [pd.read_csv(f) for f in csv_file_list]

but nothing is showing, I think there is something wrong with the path to redirect as shown in the tree above.

Or maybe there is a following step I need to do because when I ran dfs.head() it says AttributeError: 'list' object has no attribute 'head'

Upvotes: 2

Views: 1583

Answers (1)

asdf
asdf

Reputation: 1050

The following should work:

from pathlib import Path
import pandas as pd

csv_folder = Path('.')  # path to your folder, e.g. to `2022`
df = pd.concat(pd.read_csv(p) for p in csv_folder.glob('**/*.csv'))

Alternatively, if you prefer you can also use glob.glob('**/*.csv', recursive=True) instead of the Path.glob method.

Upvotes: 3

Related Questions