Reputation: 511
I'm using Python3.7 and Pandas v0.25.3. I have a dataframe like this,
pri_col col1 col2 Date
r1 3 4 2020-09-10
r2 4 1 2020-09-10
r1 2 7 2020-09-11
r3 6 4 2020-09-11
I want something like,
pri_col col1_2020-09-10 col2_2020-09-10 col1_2020-09-11 col2_2020-09-11
r1 3 4 NaN NaN
r2 4 1 NaN NaN
r1 NaN NaN 2 7
r3 NaN NaN 6 4
I have tried the following code:
import pandas as pd
df = pd.DataFrame([['r1', '3', '4', '2020-09-10'], ['r2', '4', '1', '2020-09-10'], ['r1', '2', '7', '2020-09-11'], ['r3', '6', '4', '2020-09-11']], columns=['pri_col', 'col1', 'col2', 'Date'])
print(df)
df_list = list()
for key, item in df.groupby('Date'):
item = item.drop('Date', axis=1)
item.columns = ['pri_col']+[str(x)+'_'+str(key) for x in item.columns if x != 'pri_col']
df_list.append(item)
finDf = pd.concat(df_list, sort=False)
print(finDf)
But this code is not optimized for large dataframes. Is it possible to have a vectorized implementation for this application in Pandas? Multilevel dataframe is also fine for me provided the code is vectorized. On top level I can have the value of 'Date' column and on next level I can have the columns.
Upvotes: 2
Views: 82
Reputation: 862471
Reshape with DataFrame.set_index
, DataFrame.stack
and DataFrame.unstack
is first idea, thn is necessary flatten MultiIndex
:
df = (df.set_index(['pri_col','Date'], append=True)
.stack()
.unstack([3,2])
)
df.columns = [f'{a}-{b}' for a, b in df.columns]
df = df.reset_index(level=1).
print (df)
pri_col col1-2020-09-10 col2-2020-09-10 col1-2020-09-11 col2-2020-09-11
0 r1 3.0 4.0 NaN NaN
1 r2 4.0 1.0 NaN NaN
2 r1 NaN NaN 2.0 7.0
3 r3 NaN NaN 6.0 4.0
Or reshape by DataFrame.melt
and DataFrame.pivot
:
df = (df.reset_index()
.melt(id_vars=['index','pri_col','Date'],
var_name='cols',
value_name='val')
.pivot(index=['index','pri_col'],
columns=['cols','Date'],
values='val'))
df.columns = [f'{a}-{b}' for a, b in df.columns]
df = df.reset_index(level=1).rename_axis(None)
print (df)
pri_col col1-2020-09-10 col1-2020-09-11 col2-2020-09-10 col2-2020-09-11
0 r1 3.0 NaN 4.0 NaN
1 r2 4.0 NaN 1.0 NaN
2 r1 NaN 2.0 NaN 7.0
3 r3 NaN 6.0 NaN 4.0
Upvotes: 8