Reputation: 365
Three questions within a single piece of code.
I have quite a lot of excel files which follow a similar pattern in their nomenclature, e.g Design__Tolerance_1.xlsx
, Design_Tolerance_2.xlsx
, etc. kept in one folder
Let's consider the first file Design__Tolerance_1.xlsx
as
I am reading the first three columns of the excel file in my python program using the Pandas module as follows
fields = ['Time', 'Test 1', 'Test 2']
df=pd.read_excel('Design_Tolerance_1.xlsx', skipinitialspace=True,
usecols=fields)
Next, I am finding the mean of the column Test 1
and maximum value of the column Test 2
mean_value = df['Test 1'].mean()
max_value = df['Test 2'].mean()
And, I am printing the output in a seperate .csv
file.
columns=["MEAN","MAX"]
data_under_columns = {"MEAN":[mean_value], "MAXIMUM VALUE":[max_value]}
df1 = pd.DataFrame(data_under_columns, columns=columns)
The file output_file.csv
will contain the output
df1.to_csv('output_file.csv', sep=",", index = False)
Could you help me do the following:
I have kept all my files in same folder and I want the program to read all the files having the nomenclature pattern as mentioned above (Design__Tolerance_1.xlsx
, Design_Tolerance_2.xlsx
, etc.) in the same dataframe df
, the program should run as many times as the number of files are there.
Let's say I have four excel files following the same naming pattern (Design__Tolerance_XXX.xlsx
) present in the folder, I want the program to run four times and calculate the mean of the column Test 1
and maximum value of the column Test 2
for all the files one after another; and
Print only one .csv
file as output_file.csv
which contains the output from all the excel files.
e.g.
Use of functions are acceptable too.
Upvotes: 0
Views: 3531
Reputation: 151
you can do something like that. This solution is going to go through all your folder, and for every xlsx file, it is going to create a record with the mean and the max value, and once it is done, it is going to create a dataframe with one line per file, and store it as a csv.
# std
import glob
# 3rd
import pandas as pd
# Select all the files in your directory
directory = r'path/to/your/directory'
files = glob.glob(directory + "/*.xlsx")
fields = ["Test 1", "Test 2"]
records = []
for f in files:
# Every file is put in a temp dataframe, and operations are performed
temp_df = pd.read_excel(f, skipinitialspace=True, usecols=fields)
mean_value = temp_df['Test 1'].mean()
max_value = temp_df['Test 2'].max()
records.append((mean_value, max_value))
# Finally, we create a dataframe with our records and we store it
df = pd.DataFrame.from_records(records, columns=['MEAN', 'MAX'])
df.to_csv('output_file.csv', sep=',', index=False)
Upvotes: 2