Reputation: 4408
I have a single excel workbook, df, that contains two tabs, Sheet1 and Sheet2. I would like to extract values from both tabs and create a new dataframe using openpyxl/Pandas.
Sheet1
2021 2021
q1 q2
ID 1 1
ID2 3 3
name A A
Sheet2
2021 2021
q1 q2
ID 2 2
ID2 2 2
name B B
Desired
quarter year ID ID2 name
q1 2021 1 3 A
q1 2021 2 2 B
Doing
#Load openpyxl
import openpyxl
wb = openpyxl.load_workbook("df.xlsx")
ws1 = wb.worksheets[0]
ws2 = wb.worksheets[1]
#create loop that will iterate over the first row and end at 2nd column for each sheet
for row in ws1.iter_rows(min_row = 0, max_col = 1, max_row = 3, min_col = 0
for cell in row:
print(cell.value, end="")
print()
for row in ws2.iter_rows(min_row = 0, max_col = 1, max_row = 3, min_col = 0
for cell in row:
print(cell.value, end="")
print()
I am having trouble with creating a new dataframe from the values collected. Any suggestion or input is appreciated. I am still troubleshooting this.
Upvotes: 2
Views: 847
Reputation: 9207
pd.read_excel can read a specific sheet or multiple, like shown below:
import pandas as pd
dict_dfs = pd.read_excel("df.xlsx", sheet_name=[0,1])
df = pd.concat(dict_dfs)
Afterwards you can iterate over the dictionary of dataframes or combine them directly if the format of the excel files already allows it.
Example, after loading with sample data:
import pandas as pd
df1 = pd.DataFrame({'2021': {'_': 'q1', 'ID': '2', 'ID2': '2', 'name': 'B'},
'2021.1': {'_': 'q2', 'ID': '2', 'ID2': '2', 'name': 'B'}})
df2 = pd.DataFrame({'2021': {'_': 'q1', 'ID': '1', 'ID2': '3', 'name': 'A'},
'2021.1': {'_': 'q2', 'ID': '1', 'ID2': '3', 'name': 'A'}})
df = pd.concat([df1.T,df2.T])
df.index = df.index.str.split(".").str[0]
print(df)
# _ ID ID2 name
# 2021 q1 2 2 B
# 2021 q2 2 2 B
# 2021 q1 1 3 A
# 2021 q2 1 3 A
The .T
gives you the transposed dataframe.
Upvotes: 1