justgoodin
justgoodin

Reputation: 311

How do I transform a MultiIndex to a dataframe with one index as rows, second index as columns and another columns as values

How to I convert the below multi-index dataframe:

open high low close volume
symbol date
4005 2017-12-03 07:00:00 38.75 38.75 38.75 38.75 2518
4333 2018-01-04 09:58:00 12.06 12.06 12.06 12.06 2000
4338 2018-01-22 10:06:00 9.22 9.22 9.22 9.22 10
2018-01-22 11:23:00 9.22 9.22 9.22 9.22 1000
4334 2017-12-14 08:41:00 9.28 9.28 9.28 9.28 2
4332 2018-01-02 10:19:00 10.30 10.30 10.30 10.30 100
2018-01-09 08:56:00 10.28 10.30 10.28 10.30 1500

To this table below with symbol as columns and "close" column as the value

4005 4333 4338 4334 4332
date
2017-12-03 07:00:00 38.75 nan nan nan nan
2018-01-04 09:58:00 nan 12.06 nan nan nan
2018-01-22 10:06:00 nan nan 9.22 nan nan
2018-01-22 11:23:00 nan nan 9.22 nan nan
2017-12-14 08:41:00 nan nan nan 9.28 nan
2018-01-02 10:19:00 nan nan nan nan 10.30
2018-01-09 08:56:00 nan nan nan nan 10.30

Upvotes: 2

Views: 61

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35626

Use pivot

Setup

import pandas as pd

df = pd.DataFrame({
    'symbol': {0: 4005, 1: 4333, 2: 4338, 3: 4338, 4: 4334, 5: 4332, 6: 4332},
    'date': {0: '2017-12-03 07:00:00', 1: '2018-01-04 09:58:00',
             2: '2018-01-22 10:06:00', 3: '2018-01-22 11:23:00',
             4: '2017-12-14 08:41:00', 5: '2018-01-02 10:19:00',
             6: '2018-01-09 08:56:00'},
    'open': {0: 38.75, 1: 12.06, 2: 9.22, 3: 9.22, 4: 9.28, 5: 10.3, 6: 10.28},
    'high': {0: 38.75, 1: 12.06, 2: 9.22, 3: 9.22, 4: 9.28, 5: 10.3, 6: 10.3},
    'low': {0: 38.75, 1: 12.06, 2: 9.22, 3: 9.22, 4: 9.28, 5: 10.3, 6: 10.28},
    'close': {0: 38.75, 1: 12.06, 2: 9.22, 3: 9.22, 4: 9.28, 5: 10.3, 6: 10.3},
    'volume': {0: 2518, 1: 2000, 2: 10, 3: 1000, 4: 2, 5: 100, 6: 1500}
})
# Recreate Multi-Index Data
df = df.set_index(['symbol', 'date'])

Create new_df with pivoted data.

# Reset Index and Pivot
new_df = df \
    .reset_index() \
    .pivot(index='date', columns='symbol', values='close') \
    .rename_axis(None, axis=1)

print(new_df)
                      4005  4332   4333  4334  4338
date                                               
2017-12-03 07:00:00  38.75   NaN    NaN   NaN   NaN
2017-12-14 08:41:00    NaN   NaN    NaN  9.28   NaN
2018-01-02 10:19:00    NaN  10.3    NaN   NaN   NaN
2018-01-04 09:58:00    NaN   NaN  12.06   NaN   NaN
2018-01-09 08:56:00    NaN  10.3    NaN   NaN   NaN
2018-01-22 10:06:00    NaN   NaN    NaN   NaN  9.22
2018-01-22 11:23:00    NaN   NaN    NaN   NaN  9.22

Upvotes: 1

Related Questions