Reputation: 3
I am new to Python and trying to automate some tasks. I have an Excel file with 8 sheets where each sheet has some identifier on top followed below that are tabular data with headers. Each sheet has the identifiers of interest and the tables in the same location. What I want to do is to extract some data from the top of each sheet and insert them as columns, remove unwanted rows(after I have assigned some of them to columns) and columns and then merge into one CSV file as output.
The code I have written does the job. My code reads in each sheet, performs the operations on the sheet, then I start the same process for the next sheet (8 times) before using .concat to merge them.
import pandas as pd
import numpy as np
inputfile = "input.xlsx"
outputfile = "merged.csv"
##LN X: READ FIRST SHEET AND ASSIGN HEADER INFORMATION TO COLUMNS
df1 = pd.read_excel(inputfile, sheet_name=0, usecols="A:N", index=0)
#Define cell locations of fields in the header area to be assigned to
columns
#THIS CELL LOCATIONS ARE SAME ON ALL SHEETS
A = df1.iloc[3,4]
B = df1.iloc[2,9]
C = df1.iloc[3,9]
D = df1.iloc[5,9]
E = df1.iloc[4,9]
#Insert well header info as columns in data for worksheet1
df1.insert(0,"column_name", A)
df1.insert(1,"column_name", B)
df1.insert(4,"column_name", E)
# Rename the columns in `enter code here`worksheet1 DataFrame to reflect
actual column headers
df1.rename(columns={'Unnamed: 0': 'Header1',
'Unnamed: 1': 'Header2', }, inplace=True)
df_merged = pd.concat([df1, df2, df3, df4, df5, df6, df7,
df8],ignore_index=True, sort=False)
#LN Y: Remove non-numerical entries
df_merged = df_merged.replace(np.nan, 0)
##Write results to CSV file
df_merged.to_csv(outputfile, index=False)
Since this code will be used on other Excel files with varying numbers of sheets, I am looking for any pointers on how to include the repeating operations in each sheet in a loop. Basically repeating the steps between LN X to LN Y for each sheet (8 times!!). I am struggling with how to use a loop function Thanks in advance for your assistance.
Upvotes: 0
Views: 1408
Reputation: 377
df1 = pd.read_excel(inputfile, sheet_name=0, usecols="A:N", index=0)
You should change the argument sheet_name to
sheet_name=None
Then df1 will be a dictionary of DataFrames. Then you can loop over df1 using
for df in df1:
df1[df].insert(0,"column_name", A)
....
Now perform your operations and merge the dfs. You can loop over them again and concatenate them to one final df.
Upvotes: 1