Reputation: 17
I have the following excel dataset:
($/bbl) ($/bbl) ($/bbl)
crude_petro crude_brent crude_dubai
1960M01 1.63 1.63 1.63
1960M02 1.63 1.63 1.63
1960M03 1.63 1.63 1.63
What I want to do is to convert the row names crude_petro, crude_brent, crude_dubai
into columns such that in the format like this:
EDIT 2
unit commodity price date
0 ($/bbl) crude_petro 1.63 1960M01
1 ($/bbl) crude_brent 1.63 1960M01
2 ($/bbl) crude_dubai 1.63 1960M01
How can I achieve this using pandas?
EDIT: This is how I am reading the excel to possibly parse those values
df = pd.read_excel(local_path, sheet_name='Monthly Prices', engine='openpyxl', skiprows=5, usecols="B:BT")
EDIT 3: In my final column, the data is generating extra columns that doesn't exist on my spreadsheet(source) which is associated with NaN values. For example, the commodity SILVER is outputting 'SILVER.1', 'SILVER.2', etc.
Upvotes: 0
Views: 426
Reputation: 35636
Try with melt
+ sort_index
:
new_df = (
df.melt(ignore_index=False, var_name=['unit', 'commodity'])
.sort_index()
.rename_axis('Date')
.reset_index()
)
new_df
:
Date unit commodity value
0 1960M01 ($/bbl) crude_petro 1.63
1 1960M01 ($/bbl) crude_brent 1.63
2 1960M01 ($/bbl) crude_dubai 1.63
3 1960M02 ($/bbl) crude_petro 1.63
4 1960M02 ($/bbl) crude_brent 1.63
5 1960M02 ($/bbl) crude_dubai 1.63
6 1960M03 ($/bbl) crude_petro 1.63
7 1960M03 ($/bbl) crude_brent 1.63
8 1960M03 ($/bbl) crude_dubai 1.63
Sample Frame Used:
df = pd.DataFrame({
('($/bbl)', 'crude_petro'): {'1960M01': 1.63, '1960M02': 1.63,
'1960M03': 1.63},
('($/bbl)', 'crude_brent'): {'1960M01': 1.63, '1960M02': 1.63,
'1960M03': 1.63},
('($/bbl)', 'crude_dubai'): {'1960M01': 1.63, '1960M02': 1.63,
'1960M03': 1.63}
})
df
:
($/bbl)
crude_petro crude_brent crude_dubai
1960M01 1.63 1.63 1.63
1960M02 1.63 1.63 1.63
1960M03 1.63 1.63 1.63
Upvotes: 1