khu
khu

Reputation: 161

Adding DataFrame groups as data columns

I've been struggling to find a way to reshape my dataframe to my liking. I'm fairly new to Python and not familiar with all the methods of the dataframe. Particularly pivoting. I've read through the docs multiple times and still haven't found a solution.

(data below is random)

My original data pulled into a dataframe looks like this:

        ShellSurface         S1         S2  ElementHID  sx  sy  sz        sxy  
    0              1  88.340153 -88.340153         144   0   0   0  15.225413   
    1              1  66.370153 -66.370153         144   0   0   0  21.447455   
    2              1  74.422513 -74.422513         144   0   0   0  88.114254   
    3              1  22.324573 -22.324573         144   0   0   0  74.370153   
    4              2  14.322413 -14.322413         144   0   0   0  11.114425

There are 3 surfaces per element, and the elements are quadrilaterals, so they have 4 separate entries that need to be averaged in the file. I used frame.groupby(['ElementHID','ShellSurface']).mean() to accomplish this.

The new frame:

                                      S1           S2   sx  sy  sz          sxy  
    ElementHID ShellSurface                                                      
    144        1               22.310153   -22.310153   0   0   0    21.445778   
               2               17.114552   -17.114552   0   0   0    11.114452   
               3               79.370153   -79.370153   0   0   0    19.311443 

The problem is I need columns for the data in each surface (appending the surface number to the headers) with the elements as the row. So I need the column format below:

    ElementHID    S11    S12   S13   sx1   sx2  sx3  sy1  sy2  sy3  sxy1  sxy2  sxy3

The way I've previously done this is making an element class and giving it the attributes of my data columns for each surface, but it does not seem like the most efficient way to do it. I'm hoping there's a way to achieve this in Pandas.

I've also stacked the data, which gives me an easy way to loop through the data for each surface, but I'm still not sure how to use it to reshape the data.

Upvotes: 0

Views: 51

Answers (1)

chrisb
chrisb

Reputation: 52276

Starting with your grouped data, with the index reset to columns.

df = frame.groupby(['ElementHID','ShellSurface']).mean().reset_index()

You can use pivot_table to reshape the data. Index defines what variable defines the 'rows' of the table, and columns defines which variable(s) should be pivoted into columns.

In [233]: pivoted = df.pivot_table(index=['ElementHID'], columns=['ShellSurface'])
In [234]: pivoted
Out[234]: 
                     S1                    S2             sx     sy     sz     ...
ShellSurface          1          2          1          2   1  2   1  2   1  2   
ElementHID                                                                      
144           62.864348  14.322413 -62.864348 -14.322413   0  0   0  0   0  0   

The pivoted table will have a MultiIndex column, you can access groups relatively intuitively, for example:

In [235]: pivoted['S1'][1]
Out[235]: 
ElementHID
144           62.864348
Name: 1, dtype: float64

Or, if you want to collapse down to a list of joined column names, you could do something like this.

In [239]: pivoted.columns = [''.join((lvl1, str(lvl2))) for lvl1,lvl2 in pivoted.columns]
In [240]: pivoted
Out[240]: 
                  S11        S12        S21        S22  sx1  sx2  sy1  sy2  ...
ElementHID                                                                   
144         62.864348  14.322413 -62.864348 -14.322413    0    0    0    0 

Upvotes: 1

Related Questions