Reputation: 35
I have a Python script that uses two files for running (they are excel files that the script takes as dataframes. Both excels are called in the same way but they are saved in different folders. It is defined in the script from which folders has to take the files and the name of the file). I have to execute this same script for more than 100 pair of files. The execution is the same for all of them. Is there a way to make that the script considers and be executed for each pair of excels automatically? Without having to write the same lines of code for each pair of excel files
folderA contains: fileA_1.xlsx fileA_30.xlsx ...etc
folderB contains: fileB_1.xlsx fileB_30.xlsx ...etc
I have, for fileA_1 and fileB_1, the following script:
import pandas as pd
writer1 = pd.ExcelWriter('file1.xlsx')
fileA_1 = pd.read_excel('folderA\\fileA_1.xlsx')
fileB_1 = pd.read_excel('folderB\\fileB_1.xlsx')
file1_merge = fileA_1.merge(fileB_1, how = 'left', indicator = True)
file1_merge_count = file1_merge.shape[0]
file1_merge_count
file1_merge.to_excel(writer1, 'file1_comparison', index = False)
I need to replicate this same script for each pair of files I have to analize.
Upvotes: 0
Views: 123
Reputation: 2309
This solution uses pathlib.Path to join the folder and file paths, and path.iterdir() to loop over the files in FolderA.
from pathlib import Path
folderA = Path("path/to/folderA/")
folderB = Path("path/to/folderB/")
for file in folderA.iterdir():
fileA = pd.read_excel(file)
try:
fileB = pd.read_excel(folderB / Path(file.name))
except FileNotFoundError as e:
print(f"File {file.name} was found in FolderA but not in FolderB")
continue
file_merge = fileA.merge(fileB, how='left', indicator=True)
file_merge_count = file_merge.shape[0]
writer = pd.ExcelWriter(f"file_{file.stem}.xlsx")
file_merge.to_excel(writer, f'file_{file.stem}_comparison', index=False)
Edit: Re-written based on clarification that the target filenames don't have numerically sequential filenames. Instead this will now loop over the files in FolderA, find the matching file in folderB and then name the merged file with the same file stem.
Upvotes: 0
Reputation: 1431
You can use a loop to automate the process of running your script for multiple pairs of excel files. Here's one way to go about it:
import os
import pandas
folderA = '/path/to/folderA'
folderB = '/path/to/folderB'
files_A = sorted([f for f in os.listdir(folderA) if f.endswith('.xlsx')])
files_B = sorted([f for f in os.listdir(folderB) if f.endswith('.xlsx')])
for fileA, fileB in zip(files_A, files_B):
fileA_df = pd.read_excel(f"{folderA}/{fileA}")
fileB_df = pd.read_excel(f"{folderB}/{fileB}")
file_merge = fileA_df.merge(fileB_df, how = 'left', indicator = True)
file_merge.to_excel(f'{fileA.split(".")[0]}_comparison.xlsx', index = False)
NOTE: This code allows you to iterate in pairs of excel files in two different folders, it'll work if both folders have the same amount of files and the pairs are in alphabetic order or both files have the same name
Example of folder architecture:
/path/to/
|-- folderA/
| |-- fileA_01.xlsx
| |-- fileA_02.xlsx
| |-- ...
| |-- fileA_30.xlsx
|-- folderB/
| |-- fileB_01.xlsx
| |-- fileB_02.xlsx
| |-- ...
| |-- fileB_30.xlsx
Upvotes: 1