Reputation: 23
Suppose, you have a dataframe
month 1 2 3
year
2019 a b c
2020 d e f
what I want is a transformed dataframe where the row names (year) and column names (month) serve as the index:
data
2019-01-01 a
2019-02-01 b
2019-03-01 c
2020-01-01 d
2020-02-01 e
2020-03-01 f
Is there an elegant way to do this in Pandas?
Minimal example to construct the dfs
# this builds the dataframe
import numpy as np
import pandas as pd
from datetime import date
df = pd.DataFrame(np.array([['a', 'b', 'c'], ['d', 'e', 'f']]), columns = [1, 2, 3], index = [2019, 2020])
df.columns.name = "month"
df.index.name = "year"
desired dataframe:
# this builds the desire dataframe
desired_index = [date(2019,1,1), date(2019,2,1), date(2019,3,1), date(2020,1,1), date(2020,2,1), date(2020,3,1)]
desired_df = pd.DataFrame(np.array(['a', 'b', 'c', 'd', 'e', 'f']), columns = ['data'], index = desired_index)
Upvotes: 2
Views: 40
Reputation: 863301
Use DataFrame.stack
with Series.to_frame
to one column DataFrame
and then convert flattened MultiIndex
to datetimes in list comprehension with f-string
s:
df = df.stack().to_frame('data')
df.index = pd.to_datetime([f'{y}-{m}-1' for y, m in df.index])
print (df)
data
2019-01-01 a
2019-02-01 b
2019-03-01 c
2020-01-01 d
2020-02-01 e
2020-03-01 f
Alternative solution with DataFrame.melt
and for correct order DataFrame.sort_index
:
df = df.reset_index().melt('year', value_name='data')
df.index = pd.to_datetime(df[['year', 'month']].assign(day=1))
df = df[['data']].sort_index()
print (df)
data
2019-01-01 a
2019-02-01 b
2019-03-01 c
2020-01-01 d
2020-02-01 e
2020-03-01 f
Upvotes: 2