Reputation: 131
I'm translating some datacleaning-stuff previously done in SPSS modeler to Python. In SPSS you have a 'node' that is called restructure. I'm trying to figure out how to do the same operation in Python, but I'm struggling on how to achieve this. What it does is combining every value in column X with all values in different columns A,B,C, etc... .
So, original dataframe looks like this:
Code Freq1 Freq2
A01 1 7
B02 0 6
C03 17 8
And after transforming it it should look like this:
Code Freq1 Freq2 A01_Freq1 A01_Freq2 B02_Freq1 B02_Freq2 C03_Freq1 C03_Freq2
A01 1 7 1 7 Nan Nan Nan Nan
B02 0 6 Nan Nan 0 6 Nan Nan
C03 17 8 Nan Nan Nan Nan 17 8
I've tried some pivoting stuff, but I guess this cannot be done in one step in Python...
Upvotes: 1
Views: 94
Reputation: 863166
Use DataFrame.set_index
with DataFrame.unstack
and DataFrame.sort_index
for new DataFrame
with MultiIndex
, then flatten it by f-string
s and last add to original by DataFrame.join
:
df1 = df.set_index('Code', append=True).unstack().sort_index(axis=1, level=1)
df1.columns = df1.columns.map(lambda x: f'{x[1]}_{x[0]}')
df = df.join(df1)
print (df)
Code Freq1 Freq2 A01_Freq1 A01_Freq2 B02_Freq1 B02_Freq2 C03_Freq1 \
0 A01 1 7 1.0 7.0 NaN NaN NaN
1 B02 0 6 NaN NaN 0.0 6.0 NaN
2 C03 17 8 NaN NaN NaN NaN 17.0
C03_Freq2
0 NaN
1 NaN
2 8.0
Upvotes: 1