Python: Loop through Excel sheets, assign header info to columns on each sheet, then merge to one file

Question

I am new to Python and trying to automate some tasks. I have an Excel file with 8 sheets where each sheet has some identifier on top followed below that are tabular data with headers. Each sheet has the identifiers of interest and the tables in the same location. What I want to do is to extract some data from the top of each sheet and insert them as columns, remove unwanted rows(after I have assigned some of them to columns) and columns and then merge into one CSV file as output.

The code I have written does the job. My code reads in each sheet, performs the operations on the sheet, then I start the same process for the next sheet (8 times) before using .concat to merge them.

import pandas as pd
import numpy as np
inputfile = "input.xlsx" 
outputfile = "merged.csv"

##LN X: READ FIRST SHEET AND ASSIGN HEADER INFORMATION TO COLUMNS

df1 = pd.read_excel(inputfile, sheet_name=0, usecols="A:N", index=0)

#Define cell locations of fields in the header area to be assigned to 
columns
#THIS CELL LOCATIONS ARE SAME ON ALL SHEETS
A = df1.iloc[3,4]
B = df1.iloc[2,9]
C = df1.iloc[3,9]
D = df1.iloc[5,9]
E = df1.iloc[4,9]

#Insert well header info as columns in data for worksheet1
df1.insert(0,"column_name", A)

df1.insert(1,"column_name", B)

df1.insert(4,"column_name", E)

# Rename the columns in `enter code here`worksheet1 DataFrame to reflect 
actual column headers
df1.rename(columns={'Unnamed: 0': 'Header1', 
               'Unnamed: 1': 'Header2', }, inplace=True)         

df_merged = pd.concat([df1, df2, df3, df4, df5, df6, df7, 
  df8],ignore_index=True, sort=False)

#LN Y: Remove non-numerical entries

df_merged = df_merged.replace(np.nan, 0)

##Write results to CSV file

df_merged.to_csv(outputfile, index=False)

Since this code will be used on other Excel files with varying numbers of sheets, I am looking for any pointers on how to include the repeating operations in each sheet in a loop. Basically repeating the steps between LN X to LN Y for each sheet (8 times!!). I am struggling with how to use a loop function Thanks in advance for your assistance.

robbwh · Accepted Answer

df1 = pd.read_excel(inputfile, sheet_name=0, usecols="A:N", index=0)

You should change the argument sheet_name to

sheet_name=None

Then df1 will be a dictionary of DataFrames. Then you can loop over df1 using

for df in df1:

  df1[df].insert(0,"column_name", A)
....

Now perform your operations and merge the dfs. You can loop over them again and concatenate them to one final df.

Python: Loop through Excel sheets, assign header info to columns on each sheet, then merge to one file

Answers (1)

Related Questions