Run Python script for many files from different folders

Question

I have a Python script that uses two files for running (they are excel files that the script takes as dataframes. Both excels are called in the same way but they are saved in different folders. It is defined in the script from which folders has to take the files and the name of the file). I have to execute this same script for more than 100 pair of files. The execution is the same for all of them. Is there a way to make that the script considers and be executed for each pair of excels automatically? Without having to write the same lines of code for each pair of excel files

folderA contains: fileA_1.xlsx fileA_30.xlsx ...etc

folderB contains: fileB_1.xlsx fileB_30.xlsx ...etc

I have, for fileA_1 and fileB_1, the following script:

import pandas as pd

writer1 = pd.ExcelWriter('file1.xlsx')

fileA_1 = pd.read_excel('folderA\fileA_1.xlsx')
fileB_1 = pd.read_excel('folderB\fileB_1.xlsx')

file1_merge = fileA_1.merge(fileB_1, how = 'left', indicator = True)
file1_merge_count = file1_merge.shape[0]
file1_merge_count

file1_merge.to_excel(writer1, 'file1_comparison', index = False)

I need to replicate this same script for each pair of files I have to analize.

rocha-p · Accepted Answer

You can use a loop to automate the process of running your script for multiple pairs of excel files. Here's one way to go about it:

import os
import pandas

folderA = '/path/to/folderA'
folderB = '/path/to/folderB'

files_A = sorted([f for f in os.listdir(folderA) if f.endswith('.xlsx')])
files_B = sorted([f for f in os.listdir(folderB) if f.endswith('.xlsx')])

for fileA, fileB in zip(files_A, files_B):
    fileA_df = pd.read_excel(f"{folderA}/{fileA}")
    fileB_df = pd.read_excel(f"{folderB}/{fileB}")
    file_merge = fileA_df.merge(fileB_df, how = 'left', indicator = True)
    file_merge.to_excel(f'{fileA.split(".")[0]}_comparison.xlsx', index = False)

NOTE: This code allows you to iterate in pairs of excel files in two different folders, it'll work if both folders have the same amount of files and the pairs are in alphabetic order or both files have the same name

Example of folder architecture:

/path/to/
  |-- folderA/
  |    |-- fileA_01.xlsx
  |    |-- fileA_02.xlsx
  |    |-- ...
  |    |-- fileA_30.xlsx
  |-- folderB/
  |    |-- fileB_01.xlsx
  |    |-- fileB_02.xlsx
  |    |-- ...
  |    |-- fileB_30.xlsx

Run Python script for many files from different folders

Answers (2)

Related Questions