Aman Singh
Aman Singh

Reputation: 1241

merging all csv files into one big csv file per folder using pandas

I have a directory with multiple sub-directories and sub-sub-directories. My goal is to read all the csv files and append them into one big csv file per sub-directory (P.S. not one big csv file per sub-sub-directory but one file per sub-directory).

The following is the code I have been using to append all the csv files into 1 big csv file but the only issue with this code is that the final csv file turns out to be around ~890 MB which makes it unreadable when you try opening the file in Microsoft Excel as it runs into "memory issue".

dirName = 'FTP_Data/2019/'

allFilesList = list()
print('[info.] looking for .csv files.')
for (dirpath, dirnames, filenames) in os.walk(dirName):
    for file in filenames:
        if file.endswith('.csv'):
            allFilesList += [os.path.join(dirpath, file).replace('\\','/')]

print('[info.] combining all the .csv files.')
combined_csv = pd.concat([pd.read_csv(file, sort=True) for file in allFilesList])
print('[info.] exporitng to one final output file.')
combined_csv.to_csv('FinalOutputFrom2019.csv', index=False, encoding='utf-8')

the above code generates one big csv file called "FinalOutputFrom2019.csv".

Now if the directory is structured in the following manner, I want it to generate 12 files per month - 2019-01.csv,2019-02.csv,2019-03.csv,2019-04.csv,2019-05.csv,2019-06.csv,2019-07.csv,2019-08.csv,2019-09.csv,2019-10.csv,2019-11.csv,2019-12.csv

FTP_data
|-- 2019
    |-- 2019-01
        |-- 2019-01-01
            |-- foo.csv
            |-- bar.csv
            |-- foobar.xml
        |-- 2019-01-03
            |-- foobar.csv
        |-- ...
    |-- 2019-02
        |-- 2019-02-02
            |-- ...
        |-- 2019-02-03
            |-- ...
        |-- 2019-02-05
            |-- ...
    |-- ...

Upvotes: 1

Views: 74

Answers (2)

everyt4u
everyt4u

Reputation: 352

import os
import pandas as pd


dirName = 'FTP_Data/2019/2019-'
Months = ('01', '02', '03', '04', '05', '06','07', '08', '09', '10', '11', '12')
# months are hard coded but if you prefer you can loop too.

Monthly_dirNames = []
i = 0
for mn in Months:
    path = dirName+mn

    #Get the list of csv files in each Sub directory:
    Monthly_Csv_Files = [f for f in os.listdir(path) if f.endswith('.csv')]

    #*Optional* Show the csv files to be merged to the user
    print(f'CSV files for month_{mn} are :', Monthly_Csv_Files)

    # Read and concat all csv files in that sub directory to a Data Frame:
    combined_csv = pd.concat([pd.read_csv(path+'/'+file) for file in Monthly_Csv_Files])

    # Write Combined Data Frame as a csv file in that subfolder:
    # Removing 'path+' from below will write all csv files to .py's directory:
    combined_csv.to_csv(path+f'/Monthly_Output_From_2019_{mn}.csv', index=False, encoding='utf-8')
    i += 1

Upvotes: 0

Bill Huang
Bill Huang

Reputation: 4648

Just adapt your working snippet to a for-loop through all months. Try this:

Setup

yr = 2019
dirName = f'FTP_Data/{yr}/'

def export_csv(dirName, csv_name):
    allFilesList = list()
    print('[info.] looking for .csv files.')
    for dirpath, dirnames, filenames in os.walk(dirName):
        for file in filenames:
            if file.endswith('.csv'):
                allFilesList.append(os.path.join(dirpath, file).replace('\\', '/'))

    print('[info.] combining all the .csv files.')
    combined_csv = pd.concat([pd.read_csv(file, sort=True) for file in allFilesList])
    print('[info.] exporting to one final output file.')
    combined_csv.to_csv(f'{csv_name}.csv', index=False, encoding='utf-8')

Execute

for mon in range(1, 13):
    # yr = dirName.split("/")[1]
    csv_name = f"{yr}-{mon:02}"
    subdirName = f"{dirName}{csv_name}"
    export_csv(subdirName, csv_name)

Upvotes: 1

Related Questions