Pandas : Import multiple excel files in a single data frame, perform operations one by one and print the output in a single csv file

Question

Three questions within a single piece of code.

I have quite a lot of excel files which follow a similar pattern in their nomenclature, e.g Design__Tolerance_1.xlsx, Design_Tolerance_2.xlsx, etc. kept in one folder

Let's consider the first file Design__Tolerance_1.xlsx as

I am reading the first three columns of the excel file in my python program using the Pandas module as follows

fields = ['Time', 'Test 1', 'Test 2']
df=pd.read_excel('Design_Tolerance_1.xlsx', skipinitialspace=True, 
usecols=fields)

Next, I am finding the mean of the column Test 1 and maximum value of the column Test 2

mean_value = df['Test 1'].mean()
max_value = df['Test 2'].mean()

And, I am printing the output in a seperate .csv file.

columns=["MEAN","MAX"]
data_under_columns = {"MEAN":[mean_value], "MAXIMUM VALUE":[max_value]}
df1 = pd.DataFrame(data_under_columns, columns=columns)

The file output_file.csv will contain the output

df1.to_csv('output_file.csv', sep=",",  index = False)

Could you help me do the following:

I have kept all my files in same folder and I want the program to read all the files having the nomenclature pattern as mentioned above (Design__Tolerance_1.xlsx, Design_Tolerance_2.xlsx, etc.) in the same dataframe df , the program should run as many times as the number of files are there.
Let's say I have four excel files following the same naming pattern (Design__Tolerance_XXX.xlsx) present in the folder, I want the program to run four times and calculate the mean of the column Test 1 and maximum value of the column Test 2 for all the files one after another; and
Print only one .csv file as output_file.csv which contains the output from all the excel files. e.g.

Use of functions are acceptable too.

Jrmyy · Accepted Answer

you can do something like that. This solution is going to go through all your folder, and for every xlsx file, it is going to create a record with the mean and the max value, and once it is done, it is going to create a dataframe with one line per file, and store it as a csv.

# std
import glob

# 3rd
import pandas as pd

# Select all the files in your directory
directory = r'path/to/your/directory'
files = glob.glob(directory + "/*.xlsx")
fields = ["Test 1", "Test 2"]
records = []

for f in files:
    # Every file is put in a temp dataframe, and operations are performed
    temp_df = pd.read_excel(f, skipinitialspace=True, usecols=fields)
    mean_value = temp_df['Test 1'].mean()
    max_value = temp_df['Test 2'].max()
    records.append((mean_value, max_value))

# Finally, we create a dataframe with our records and we store it
df = pd.DataFrame.from_records(records, columns=['MEAN', 'MAX'])
df.to_csv('output_file.csv', sep=',',  index=False)

Pandas : Import multiple excel files in a single data frame, perform operations one by one and print the output in a single csv file

Answers (1)

Related Questions