merging all csv files into one big csv file per folder using pandas

Question

I have a directory with multiple sub-directories and sub-sub-directories. My goal is to read all the csv files and append them into one big csv file per sub-directory (P.S. not one big csv file per sub-sub-directory but one file per sub-directory).

The following is the code I have been using to append all the csv files into 1 big csv file but the only issue with this code is that the final csv file turns out to be around ~890 MB which makes it unreadable when you try opening the file in Microsoft Excel as it runs into "memory issue".

dirName = 'FTP_Data/2019/'

allFilesList = list()
print('[info.] looking for .csv files.')
for (dirpath, dirnames, filenames) in os.walk(dirName):
    for file in filenames:
        if file.endswith('.csv'):
            allFilesList += [os.path.join(dirpath, file).replace('\','/')]

print('[info.] combining all the .csv files.')
combined_csv = pd.concat([pd.read_csv(file, sort=True) for file in allFilesList])
print('[info.] exporitng to one final output file.')
combined_csv.to_csv('FinalOutputFrom2019.csv', index=False, encoding='utf-8')

the above code generates one big csv file called "FinalOutputFrom2019.csv".

Now if the directory is structured in the following manner, I want it to generate 12 files per month - 2019-01.csv,2019-02.csv,2019-03.csv,2019-04.csv,2019-05.csv,2019-06.csv,2019-07.csv,2019-08.csv,2019-09.csv,2019-10.csv,2019-11.csv,2019-12.csv

FTP_data
|-- 2019
    |-- 2019-01
        |-- 2019-01-01
            |-- foo.csv
            |-- bar.csv
            |-- foobar.xml
        |-- 2019-01-03
            |-- foobar.csv
        |-- ...
    |-- 2019-02
        |-- 2019-02-02
            |-- ...
        |-- 2019-02-03
            |-- ...
        |-- 2019-02-05
            |-- ...
    |-- ...

Bill Huang · Accepted Answer

Just adapt your working snippet to a for-loop through all months. Try this:

Setup

yr = 2019
dirName = f'FTP_Data/{yr}/'

def export_csv(dirName, csv_name):
    allFilesList = list()
    print('[info.] looking for .csv files.')
    for dirpath, dirnames, filenames in os.walk(dirName):
        for file in filenames:
            if file.endswith('.csv'):
                allFilesList.append(os.path.join(dirpath, file).replace('\', '/'))

    print('[info.] combining all the .csv files.')
    combined_csv = pd.concat([pd.read_csv(file, sort=True) for file in allFilesList])
    print('[info.] exporting to one final output file.')
    combined_csv.to_csv(f'{csv_name}.csv', index=False, encoding='utf-8')

Execute

for mon in range(1, 13):
    # yr = dirName.split("/")[1]
    csv_name = f"{yr}-{mon:02}"
    subdirName = f"{dirName}{csv_name}"
    export_csv(subdirName, csv_name)

merging all csv files into one big csv file per folder using pandas

Answers (2)

Related Questions