arnav chauhan
arnav chauhan

Reputation: 1

Unpivot Dataframe with multilayered headers into long format

I have an excel file which has a pivoted table. I want to bring it into a long format so the same can be inserted into a DB.

PFA the image for reference

loading the file in df.

file_path = '/content/drive/My Drive/Experiment_unpivot.xlsx' df = pd.read_excel(file_path,header=[0,1])

df the columns are merged in the excel file so getting Unnamed: 0_level_0 2023-09-01 00:00:00

Please advise what needs to be done. Using pd.melt or using power query. I want the melted table

Upvotes: 0

Views: 80

Answers (1)

horseyride
horseyride

Reputation: 21428

Your data in general makes no sense because it has the same column names twice under each month. However, here is sample code for Powerquery

let Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Removed Top Rows" = Table.Skip(Source,1),
#"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows"),
GetMonthNames = Table.FromColumns({Record.FieldValues(Source{0})}),
#"Filled Down" = Table.ToColumns(Table.FillDown(GetMonthNames,{"Column1"})){0},
NewColumnNames_Part1 = List.Zip ({#"Filled Down",Table.ColumnNames(#"Promoted Headers")}),
NewColumnNames_Part2 = List.Transform(NewColumnNames_Part1, each Text.Combine(_,"-")),
#"Rename Columns"=Table.RenameColumns(Source,List.Zip({Table.ColumnNames(Source),NewColumnNames_Part2}))  ,
#"Removed Two Rows" = Table.Skip(#"Rename Columns",2),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Removed Two Rows", {"Products"}, "Attribute", "Value")
in #"Unpivoted Other Columns"

You can right-click the Attribute column and split on - character if you want

enter image description here

Upvotes: 0

Related Questions