Reputation: 986
I have a dataframe with the following structure
idx value Formula_name
0 123456789 100 Frequency No4
1 123456789 150 Frequency No25
2 123456789 125 Frequency No27
3 123456789 0.2 Power Level No4
4 123456789 0.5 Power Level No25
5 123456789 -1.0 Power Level No27
6 123456789 32 SNR No4
7 123456789 35 SNR No25
8 123456789 37 SNR No27
9 111222333 ...
So the only way to relate a frequency to its corresponding metric is via the number of the frequency. I know the possible range (from 100 to 200 MHz in steps of 25 MHz), but not which frequencies (or how many) show up in the data, nor which "number" is used to relate the frequency to the metric.
I would like to arrive at a dataframe similar to that:
SNR Power Level
idx 100 125 150 175 200 100 125 150 175 200
0 123456789 32 37 35 NaN NaN 0.2 -1.0 0.5 NaN NaN
1 111222333 ...
For only one metric, I created two dataframes, one with the frequencies, one with the metric, and merged them on the number:
idx Formula_x value_x number Formula_y value_y
0 123456789 SNR 32 4 frequency 100
1 123456789 SNR 35 25 frequency 150
Then I would unstack the dataframe:
df.groupby(['idx','value_y']).first()[['value_x']].unstack()
This works for one metric, but I don't really see how I can apply it to more metrics and access them with a multiindex in the columns.
Any ideas and suggestions would be very welcome.
Upvotes: 3
Views: 14298
Reputation: 863256
You can use:
print (df)
idx value Formula_name
0 123456789 100.0 Frequency No4
1 123456789 150.0 Frequency No25
2 123456789 125.0 Frequency No27
3 123456789 0.2 Power Level No4
4 123456789 0.5 Power Level No25
5 123456789 -1.0 Power Level No27
6 123456789 32.0 SNR No4
7 123456789 35.0 SNR No25
8 123456789 37.0 SNR No27
#create new columns from Formula_name
df[['a','b']] = df.Formula_name.str.rsplit(n=1, expand=True)
#maping by Series column b - from No4, No25 to numbers 100,150...
maps = df[df.a == 'Frequency'].set_index('b')['value'].astype(int)
df['b'] = df.b.map(maps)
#remove rows where is Frequency, remove column Formula_name
df1 = df[df.a != 'Frequency'].drop('Formula_name', axis=1)
print (df1)
idx value a b
3 123456789 0.2 Power Level 100
4 123456789 0.5 Power Level 150
5 123456789 -1.0 Power Level 125
6 123456789 32.0 SNR 100
7 123456789 35.0 SNR 150
8 123456789 37.0 SNR 125
Two solutions - with unstack
and with pivot_table
.
df2 = df1.set_index(['idx','a', 'b']).unstack([1,2])
df2.columns = df2.columns.droplevel(0)
df2 = df2.rename_axis(None).rename_axis([None, None], axis=1)
print (df2)
Power Level SNR
100 150 125 100 150 125
123456789 0.2 0.5 -1.0 32.0 35.0 37.0
df3 = df1.pivot_table(index='idx', columns=['a','b'], values='value')
df3 = df3.rename_axis(None).rename_axis([None, None], axis=1)
print (df3)
Power Level SNR
100 125 150 100 125 150
123456789 0.2 -1.0 0.5 32.0 37.0 35.0
Upvotes: 5