Octavarium
Octavarium

Reputation: 131

'restructure' dataframe in python

I'm translating some datacleaning-stuff previously done in SPSS modeler to Python. In SPSS you have a 'node' that is called restructure. I'm trying to figure out how to do the same operation in Python, but I'm struggling on how to achieve this. What it does is combining every value in column X with all values in different columns A,B,C, etc... .

So, original dataframe looks like this:

Code         Freq1    Freq2   
A01          1        7               
B02          0        6               
C03          17       8              

And after transforming it it should look like this:

Code         Freq1    Freq2    A01_Freq1  A01_Freq2  B02_Freq1   B02_Freq2    C03_Freq1   C03_Freq2
A01          1        7        1          7          Nan         Nan          Nan         Nan
B02          0        6        Nan        Nan        0           6            Nan         Nan
C03          17       8        Nan        Nan        Nan         Nan          17          8       

I've tried some pivoting stuff, but I guess this cannot be done in one step in Python...

Upvotes: 1

Views: 94

Answers (1)

jezrael
jezrael

Reputation: 863166

Use DataFrame.set_index with DataFrame.unstack and DataFrame.sort_index for new DataFrame with MultiIndex, then flatten it by f-strings and last add to original by DataFrame.join:

df1 = df.set_index('Code', append=True).unstack().sort_index(axis=1, level=1)
df1.columns = df1.columns.map(lambda x: f'{x[1]}_{x[0]}')

df = df.join(df1)
print (df)
  Code  Freq1  Freq2  A01_Freq1  A01_Freq2  B02_Freq1  B02_Freq2  C03_Freq1  \
0  A01      1      7        1.0        7.0        NaN        NaN        NaN   
1  B02      0      6        NaN        NaN        0.0        6.0        NaN   
2  C03     17      8        NaN        NaN        NaN        NaN       17.0   

   C03_Freq2  
0        NaN  
1        NaN  
2        8.0  

Upvotes: 1

Related Questions