Reputation: 1241
I have a directory with multiple sub-directories and sub-sub-directories. My goal is to read all the csv files and append them into one big csv file per sub-directory (P.S. not one big csv file per sub-sub-directory but one file per sub-directory).
The following is the code I have been using to append all the csv files into 1 big csv file but the only issue with this code is that the final csv file turns out to be around ~890 MB which makes it unreadable when you try opening the file in Microsoft Excel as it runs into "memory issue".
dirName = 'FTP_Data/2019/'
allFilesList = list()
print('[info.] looking for .csv files.')
for (dirpath, dirnames, filenames) in os.walk(dirName):
for file in filenames:
if file.endswith('.csv'):
allFilesList += [os.path.join(dirpath, file).replace('\\','/')]
print('[info.] combining all the .csv files.')
combined_csv = pd.concat([pd.read_csv(file, sort=True) for file in allFilesList])
print('[info.] exporitng to one final output file.')
combined_csv.to_csv('FinalOutputFrom2019.csv', index=False, encoding='utf-8')
the above code generates one big csv file called "FinalOutputFrom2019.csv".
Now if the directory is structured in the following manner, I want it to generate 12 files per month - 2019-01.csv
,2019-02.csv
,2019-03.csv
,2019-04.csv
,2019-05.csv
,2019-06.csv
,2019-07.csv
,2019-08.csv
,2019-09.csv
,2019-10.csv
,2019-11.csv
,2019-12.csv
FTP_data
|-- 2019
|-- 2019-01
|-- 2019-01-01
|-- foo.csv
|-- bar.csv
|-- foobar.xml
|-- 2019-01-03
|-- foobar.csv
|-- ...
|-- 2019-02
|-- 2019-02-02
|-- ...
|-- 2019-02-03
|-- ...
|-- 2019-02-05
|-- ...
|-- ...
Upvotes: 1
Views: 74
Reputation: 352
import os
import pandas as pd
dirName = 'FTP_Data/2019/2019-'
Months = ('01', '02', '03', '04', '05', '06','07', '08', '09', '10', '11', '12')
# months are hard coded but if you prefer you can loop too.
Monthly_dirNames = []
i = 0
for mn in Months:
path = dirName+mn
#Get the list of csv files in each Sub directory:
Monthly_Csv_Files = [f for f in os.listdir(path) if f.endswith('.csv')]
#*Optional* Show the csv files to be merged to the user
print(f'CSV files for month_{mn} are :', Monthly_Csv_Files)
# Read and concat all csv files in that sub directory to a Data Frame:
combined_csv = pd.concat([pd.read_csv(path+'/'+file) for file in Monthly_Csv_Files])
# Write Combined Data Frame as a csv file in that subfolder:
# Removing 'path+' from below will write all csv files to .py's directory:
combined_csv.to_csv(path+f'/Monthly_Output_From_2019_{mn}.csv', index=False, encoding='utf-8')
i += 1
Upvotes: 0
Reputation: 4648
Just adapt your working snippet to a for-loop through all months. Try this:
Setup
yr = 2019
dirName = f'FTP_Data/{yr}/'
def export_csv(dirName, csv_name):
allFilesList = list()
print('[info.] looking for .csv files.')
for dirpath, dirnames, filenames in os.walk(dirName):
for file in filenames:
if file.endswith('.csv'):
allFilesList.append(os.path.join(dirpath, file).replace('\\', '/'))
print('[info.] combining all the .csv files.')
combined_csv = pd.concat([pd.read_csv(file, sort=True) for file in allFilesList])
print('[info.] exporting to one final output file.')
combined_csv.to_csv(f'{csv_name}.csv', index=False, encoding='utf-8')
Execute
for mon in range(1, 13):
# yr = dirName.split("/")[1]
csv_name = f"{yr}-{mon:02}"
subdirName = f"{dirName}{csv_name}"
export_csv(subdirName, csv_name)
Upvotes: 1