Asif Iqbal
Asif Iqbal

Reputation: 511

Create columns for Pandas Dataframe according to value of other column

I'm using Python3.7 and Pandas v0.25.3. I have a dataframe like this,

pri_col col1 col2        Date
     r1    3    4  2020-09-10
     r2    4    1  2020-09-10
     r1    2    7  2020-09-11
     r3    6    4  2020-09-11

I want something like,

pri_col col1_2020-09-10 col2_2020-09-10 col1_2020-09-11 col2_2020-09-11
   r1               3               4             NaN             NaN
   r2               4               1             NaN             NaN
   r1             NaN             NaN               2               7
   r3             NaN             NaN               6               4

I have tried the following code:

import pandas as pd
df = pd.DataFrame([['r1', '3', '4', '2020-09-10'], ['r2', '4', '1', '2020-09-10'], ['r1', '2', '7', '2020-09-11'], ['r3', '6', '4', '2020-09-11']], columns=['pri_col', 'col1', 'col2', 'Date'])
print(df)
df_list = list()
for key, item in df.groupby('Date'):
    item = item.drop('Date', axis=1)
    item.columns = ['pri_col']+[str(x)+'_'+str(key) for x in item.columns if x != 'pri_col']
    df_list.append(item)
finDf = pd.concat(df_list, sort=False)
print(finDf)

But this code is not optimized for large dataframes. Is it possible to have a vectorized implementation for this application in Pandas? Multilevel dataframe is also fine for me provided the code is vectorized. On top level I can have the value of 'Date' column and on next level I can have the columns.

Upvotes: 2

Views: 82

Answers (1)

jezrael
jezrael

Reputation: 862471

Reshape with DataFrame.set_index, DataFrame.stack and DataFrame.unstack is first idea, thn is necessary flatten MultiIndex:

df = (df.set_index(['pri_col','Date'], append=True)
        .stack()
        .unstack([3,2])
        )
df.columns = [f'{a}-{b}' for a, b in df.columns]
df = df.reset_index(level=1).
print (df)
  pri_col  col1-2020-09-10  col2-2020-09-10  col1-2020-09-11  col2-2020-09-11
0      r1              3.0              4.0              NaN              NaN
1      r2              4.0              1.0              NaN              NaN
2      r1              NaN              NaN              2.0              7.0
3      r3              NaN              NaN              6.0              4.0

Or reshape by DataFrame.melt and DataFrame.pivot:

df = (df.reset_index()
        .melt(id_vars=['index','pri_col','Date'], 
              var_name='cols',
              value_name='val')
        .pivot(index=['index','pri_col'],
               columns=['cols','Date'],
               values='val'))
df.columns = [f'{a}-{b}' for a, b in df.columns]
df = df.reset_index(level=1).rename_axis(None)
print (df)
  pri_col  col1-2020-09-10  col1-2020-09-11  col2-2020-09-10  col2-2020-09-11
0      r1              3.0              NaN              4.0              NaN
1      r2              4.0              NaN              1.0              NaN
2      r1              NaN              2.0              NaN              7.0
3      r3              NaN              6.0              NaN              4.0

Upvotes: 8

Related Questions