groupby/unstack on columns name

Question

I have a dataframe with the following structure

    idx  value  Formula_name
0   123456789     100     Frequency No4
1   123456789     150     Frequency No25
2   123456789     125     Frequency No27
3   123456789     0.2     Power Level No4
4   123456789     0.5     Power Level No25
5   123456789     -1.0    Power Level No27
6   123456789     32      SNR  No4
7   123456789     35      SNR  No25
8   123456789     37      SNR  No27
9   111222333     ...

So the only way to relate a frequency to its corresponding metric is via the number of the frequency. I know the possible range (from 100 to 200 MHz in steps of 25 MHz), but not which frequencies (or how many) show up in the data, nor which "number" is used to relate the frequency to the metric.

I would like to arrive at a dataframe similar to that:

                  SNR                        Power Level
    idx           100   125  150   175  200  100  125  150 175 200
0   123456789     32    37   35    NaN  NaN  0.2  -1.0 0.5 NaN NaN
1   111222333     ...

For only one metric, I created two dataframes, one with the frequencies, one with the metric, and merged them on the number:

     idx         Formula_x  value_x number   Formula_y  value_y
0    123456789   SNR        32      4        frequency  100
1    123456789   SNR        35      25       frequency  150

Then I would unstack the dataframe:

df.groupby(['idx','value_y']).first()[['value_x']].unstack()

This works for one metric, but I don't really see how I can apply it to more metrics and access them with a multiindex in the columns.

Any ideas and suggestions would be very welcome.

jezrael · Accepted Answer

You can use:

print (df)
         idx  value      Formula_name
0  123456789  100.0     Frequency No4
1  123456789  150.0    Frequency No25
2  123456789  125.0    Frequency No27
3  123456789    0.2   Power Level No4
4  123456789    0.5  Power Level No25
5  123456789   -1.0  Power Level No27
6  123456789   32.0           SNR No4
7  123456789   35.0          SNR No25
8  123456789   37.0          SNR No27

#create new columns from Formula_name
df[['a','b']] = df.Formula_name.str.rsplit(n=1, expand=True)

#maping by Series column b - from No4, No25 to numbers 100,150...
maps = df[df.a == 'Frequency'].set_index('b')['value'].astype(int)
df['b'] = df.b.map(maps)

#remove rows where is Frequency, remove column Formula_name
df1 = df[df.a != 'Frequency'].drop('Formula_name', axis=1)
print (df1)
         idx  value            a    b
3  123456789    0.2  Power Level  100
4  123456789    0.5  Power Level  150
5  123456789   -1.0  Power Level  125
6  123456789   32.0          SNR  100
7  123456789   35.0          SNR  150
8  123456789   37.0          SNR  125

Two solutions - with unstack and with pivot_table.

df2 = df1.set_index(['idx','a', 'b']).unstack([1,2])
df2.columns = df2.columns.droplevel(0)
df2 = df2.rename_axis(None).rename_axis([None, None], axis=1)
print (df2)
          Power Level             SNR            
                  100  150  125   100   150   125
123456789         0.2  0.5 -1.0  32.0  35.0  37.0

df3 = df1.pivot_table(index='idx', columns=['a','b'], values='value')
df3 = df3.rename_axis(None).rename_axis([None, None], axis=1)
print (df3)
          Power Level             SNR            
                  100  125  150   100   125   150
123456789         0.2 -1.0  0.5  32.0  37.0  35.0

groupby/unstack on columns name

Answers (1)

Related Questions