Reputation: 167
I have the following data. It is all in one excel file.
Sheet name: may2019
Productivity Count
Date : 01-Apr-2020 00:00 to 30-Apr-2020 23:59
Date Type: Finalized Date Modality: All
Name MR DX CT US MG BMD TOTAL
Svetlana 29 275 101 126 5 5 541
Kate 32 652 67 171 1 0 923
Andrew 0 452 0 259 1 0 712
Tom 50 461 61 104 4 0 680
Maya 0 353 0 406 0 0 759
Ben 0 1009 0 143 0 0 1152
Justin 0 2 9 0 1 9 21
Total 111 3204 238 1209 12 14 4788
Sheet Name: June 2020
Productivity Count
Date : 01-Jun-2019 00:00 to 30-Jun-2019 23:59
Date Type: Finalized Date Modality: All
NAme US DX CT MR MG BMD TOTAL
Svetlana 4 0 17 6 0 4 31
Kate 158 526 64 48 1 0 797
Andrew 154 230 0 0 0 0 384
Tom 1 0 19 20 2 8 50
Maya 260 467 0 0 1 1 729
Ben 169 530 59 40 3 0 801
Justin 125 164 0 0 4 0 293
Alvin 0 1 0 0 0 0 1
Total 871 1918 159 114 11 13 3086
I want to merge all the sheets into on sheet, drop the first 3 rows of all the sheets and and this is the output I am looking for
Sl.No Name US_jun2019 DX_jun2019 CT_jun2019 MR_jun2019 MG_jun2019 BMD_jun2019 TOTAL_jun2019 MR_may2019 DX_may2019 CT_may2019 US_may2019 MG_may2019 BMD_may2019 TOTAL_may2019
1 Svetlana 4 0 17 6 0 4 31 29 275 101 126 5 5 541
2 Kate 158 526 64 48 1 0 797 32 652 67 171 1 0 923
3 Andrew 154 230 0 0 0 0 384 0 353 0 406 0 0 759
4 Tom 1 0 19 20 2 8 50 0 2 9 0 1 9 21
5 Maya 260 467 0 0 1 1 729 0 1009 0 143 0 0 1152
6 Ben 169 530 59 40 3 0 801 50 461 61 104 4 0 680
7 Justin 125 164 0 0 4 0 293 0 452 0 259 1 0 712
8 Alvin 0 1 0 0 0 0 1 #N/A #N/A #N/A #N/A #N/A #N/A #N/A
I tried the following code but the output is not the one i am looking for.
df=pd.concat(df,sort=False)
df= df.drop(df.index[[0,1]])
df=df.rename(columns=df.iloc[0])
df= df.drop(df.index[[0]])
df=df.drop(['Sl.No'], axis = 1)
print(df)
Upvotes: 1
Views: 751
Reputation: 1707
First, read both Excel sheets.
>>> df1 = pd.read_excel('path/to/excel/file.xlsx', sheet_name="may2019")
>>> df2 = pd.read_excel('path/to/excel/file.xlsx', sheet_name="jun2019")
Drop the first three rows.
>>> df1.drop(index=range(3), inplace=True)
>>> df2.drop(index=range(3), inplace=True)
Rename columns to the first row, and drop the first row
>>> df1.rename(columns=dict(zip(df1.columns, df1.iloc[0])), inplace=True)
>>> df1.drop(index=[0], inplace=True)
>>> df2.rename(columns=dict(zip(df2.columns, df2.iloc[0])), inplace=True)
>>> df2.drop(index=[0], inplace=True)
Add suffixes to the columns.
>>> df1.rename(columns=lambda col_name: col_name + '_may2019', inplace=True)
>>> df2.rename(columns=lambda col_name: col_name + '_jun2019', inplace=True)
Remove the duplicate name column in the second DF.
>>> df2.drop(columns=['Name'], inplace=True)
Concatenate both the dataframes
>>> df = pd.concat([df1, df2], axis=1, inplace=True)
All the code in one place:
import pandas as pd
df1 = pd.read_excel('path/to/excel/file.xlsx', sheet_name="may2019")
df2 = pd.read_excel('path/to/excel/file.xlsx', sheet_name="jun2019")
df1.drop(index=range(3), inplace=True)
df2.drop(index=range(3), inplace=True)
df1.rename(columns=dict(zip(df1.columns, df1.iloc[0])), inplace=True)
df1.drop(index=[0], inplace=True)
df2.rename(columns=dict(zip(df2.columns, df2.iloc[0])), inplace=True)
df2.drop(index=[0], inplace=True)
df1.rename(columns=lambda col_name: col_name + '_may2019', inplace=True)
df2.rename(columns=lambda col_name: col_name + '_jun2019', inplace=True)
df2.drop(columns=['Name'], inplace=True)
df = pd.concat([df2, df1], axis=1, inplace=True)
print(df)
Upvotes: 1