Vladimir Emelianov
Vladimir Emelianov

Reputation: 91

How to convert wide data to long data based on three variables

I have a dataframe that is structure as such:

Item FY20 FY21 FY22 ...  
Case High Low Base
Multiple 1.2 2.3 3.4
Cash 1.1 1.4 1.2

I need the data to look like this:

Item Date Case Value
Cash FY20 High 1.1
Cash FY21 Low 1.4
Cash FY22 Base 1.2

So I essentially want to transform the data from wide format to a long format based on "Case", the "FY"s and the item.

I've already tried using multi indexes and messed around a bit with pd.pivot but i'm honestly stumped here.

Upvotes: 0

Views: 66

Answers (2)

Scott Boston
Scott Boston

Reputation: 153500

IIUC, you can use this bit of code to reshape your dataframe:

df.set_index('Item')\ # move Item into dataframe index
  .T\  # transpose dataframe
  .rename_axis('Date')\  #rename index to Date
  .reset_index()\  #move index into dataframe as column
  .melt(['Date', 'Case'])  #melt dataframe to get to long format

Output:

   Date  Case      Item value
0  FY20  High  Multiple   1.2
1  FY21   Low  Multiple   2.3
2  FY22  Base  Multiple   3.4
3  FY20  High      Cash   1.1
4  FY21   Low      Cash   1.4
5  FY22  Base      Cash   1.2

Details:

Where df is:

       Item  FY20 FY21  FY22
0      Case  High  Low  Base
1  Multiple   1.2  2.3   3.4
2      Cash   1.1  1.4   1.2

df.set_index('Item').T  

Almost there,

Item  Case Multiple Cash
FY20  High      1.2  1.1
FY21   Low      2.3  1.4
FY22  Base      3.4  1.2

df.set_index('Item').T.rename_axis('Date').reset_index()

Add rename_axis and reset_index to prepare dataframe for melt,

Item  Date  Case Multiple Cash
0     FY20  High      1.2  1.1
1     FY21   Low      2.3  1.4
2     FY22  Base      3.4  1.2

Lastly melt dataframe:

df.set_index('Item').T.rename_axis('Date').reset_index().melt(['Date', 'Case'])

Output:

   Date  Case      Item value
0  FY20  High  Multiple   1.2
1  FY21   Low  Multiple   2.3
2  FY22  Base  Multiple   3.4
3  FY20  High      Cash   1.1
4  FY21   Low      Cash   1.4
5  FY22  Base      Cash   1.2

And, if you only want the "Cash" records, then use this

df_out = df.set_index('Item').T.rename_axis('Date').reset_index().melt(['Date', 'Case'])
df_out.query('Item == "Cash"')

Output:

   Date  Case  Item value
3  FY20  High  Cash   1.1
4  FY21   Low  Cash   1.4
5  FY22  Base  Cash   1.2

Upvotes: 0

Valdi_Bo
Valdi_Bo

Reputation: 30991

Let's start from creation of your source DataFrame:

df = pd.DataFrame(data=[
    [ 'Item',     'FY20', 'FY21', 'FY22' ],
    [ 'Case',     'High', 'Low',  'Base' ],
    [ 'Multiple', 1.2,    2.3,    3.4    ],
    [ 'Cash',     1.1,    1.4,    1.2    ]])

The result is:

          0     1     2     3
0      Item  FY20  FY21  FY22
1      Case  High   Low  Base
2  Multiple   1.2   2.3   3.4
3      Cash   1.1   1.4   1.2

Then we have to:

  • transpose this DataFrame,
  • convert the first row into column names,
  • change the first column name:

To do this, run:

df2 = df.transpose()
df2.columns = df2.iloc[0].tolist()
df2.drop(index=0, inplace=True)
df2.rename(columns={'Item': 'Date'})

The result is:

   Date  Case Multiple Cash
1  FY20  High      1.2  1.1
2  FY21   Low      2.3  1.4
3  FY22  Base      3.4  1.2

And to get your result, run:

df2.melt(id_vars=['Date', 'Case'], value_vars=['Cash'],
    var_name='Name', value_name='Value')

and you will receive:

   Date  Case  Name Value
0  FY20  High  Cash   1.1
1  FY21   Low  Cash   1.4
2  FY22  Base  Cash   1.2

Or maybe the result should include also melting of Multiple column? To achieve this, remove value_vars=['Cash']. This way melting will include all remaining columns (other than included in id_vars).

Upvotes: 2

Related Questions