Uis234
Uis234

Reputation: 261

Unpivot/Flatten pandas pivot table into one level index

For an analysis I start of with a dataframe, that looks like this (but is much larger):

      ID1        ID2           type       Number
0     IE345      E90              I           38
1     IE345      E92              E           26
2     IE345      E93              E           21
3     IE345      E95              R            9
4     IE346      E94              I           41
7     IE346      BLK              E            1

I would like to unpivot the following dataframe:

Df1 = pd.pivot_table(loads, values=['Number'], 
                       index = ['ID1', 'ID2'], 
                       columns=['Type'], margins=True, 
                       aggfunc=[sum] , fill_value=0)

Df1:

                               sum                                                    
                       Number                                                             
type                             B     D     E     I    L     R    All                         
ID1        ID2                                                                         
IE345      E90                   0     0     0    38    0     0     38    
           E92                   0     0    26     0    0     0     26    
           E93                   0     0    21     0    0     0     21    
           E95                   0     0     0     0    0     9      9    
IE346      E94                   0     0     0    41    0     0     41    
           BLK                   0     0     1     0    0     0      1  

Into:

Df1:

ID1        ID2                   B     D     E     I    L     R    All                                                                         
IE345      E90                   0     0     0    38    0     0     38    
IE345      E92                   0     0    26     0    0     0     26    
IE345      E93                   0     0    21     0    0     0     21    
IE345      E95                   0     0     0     0    0     9      9    
IE346      E94                   0     0     0    41    0     0     41    
IE346      BLK                   0     0     1     0    0     0      1

It seems that pandas.melt is what i am looking for, but cant manage to make it work.

Afterwards I would like to add the column name of highest value of a row, but without the margins of course:

ID1        ID2                   B     D     E     I    L     R    All   Max                                                                        
IE345      E90                   0     0     0    38    0     0     38    I    
IE345      E92                   0     0    26     0    0     0     26    E
IE345      E93                   0     0    21     0    6     0     27    E
IE345      E95                   0     0     0     0    0     9      9    R
IE345      E94                   0     0     0    41    0     0     41    I
IE345      BLK                   0     0     1     0    1     0      2    E

For the max value, I used:

df['Max'] = df.idxmax(axis=1, skipna=True)

but this takes the All with it unfortunately. Any ideas on how I can achieve my goal the most efficient way?

!!!EDIT!!!!

For the first part, I constructed a solution which returns exacte a unpivot pivot table! See here the code:

df.columns = df.columns.get_level_values('Type')
df.reset_index(inplace=True)

Now I tried the solution of Vmg for the Max value but unfortunately it returns:

ValueError: could not convert string to float: 

Any ideas on how to fix this?

Upvotes: 2

Views: 3785

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210852

As @vmg has already said idxmax will do the trick:

import io
import StringIO     # for Python 2.X
import pandas as pd

data = """\
      id1        id2           type       number
0     IE345      E90              I           38
1     IE345      E92              E           26
2     IE345      E93              E           21
3     IE345      E95              R            9
4     IE346      E94              I           41
7     IE346      BLK              E            1
"""

#loads = pd.read_csv(io.StringIO(data), sep='\s+', index_col=0)  # for Python 3.X
loads = pd.read_csv(StringIO.StringIO(data), sep='\s+', index_col=0)  # for Python 2.X

# **** interesting part starts here ****

# save all unique types, we will use it later
types = loads.type.unique()

df = pd.pivot_table(loads, values=['number'], 
                       index = ['id1', 'id2'], 
                       columns=['type'], margins=True, 
                       aggfunc='sum', fill_value=0) \
       .reset_index()

# reset column names
df.columns = [c[1] if c[1] else c[0] for c in df.columns.tolist()]

df['max'] = df[types].idxmax(axis=1)

Output:

In [266]: df
Out[266]:
     id1  id2     E     I    R    All max
0  IE345  E90   0.0  38.0  0.0   38.0   I
1  IE345  E92  26.0   0.0  0.0   26.0   E
2  IE345  E93  21.0   0.0  0.0   21.0   E
3  IE345  E95   0.0   0.0  9.0    9.0   R
4  IE346  BLK   1.0   0.0  0.0    1.0   E
5  IE346  E94   0.0  41.0  0.0   41.0   I
6    All       48.0  79.0  9.0  136.0   I

Upvotes: 2

vmg
vmg

Reputation: 4326

The first transformation you describe doesn't seem to be necessary, as it seems you still want ID1 and ID2 to be indexes. The first representation is just how pandas shows multi indexes, without repeating the first index for every entry in the second.

Your problem at hand, of applying idxmax without all can be achieved with:

proef['Dominant'] = proef.iloc[:,:-1].idxmax(axis=1, skipna=True)

where iloc[:,:-1] just means you're disregarding the rightmost column.

Upvotes: 0

Related Questions