Pandas groupby indexing preservation

Question

I have a dataframe in pandas that looks like this:

In[12]: a
Out[12]: 
    alpha  beta  gamma           G1           G2           G3  \
0     7         1          1     5.617164     9.048353     2.229996   
1     7         2          2  1469.200000  1475.224000  1468.049000   
2     7         3         62     5.595052     8.975431     2.203285   
3     7         4         62   355.046700   -29.822800     6.901535   
4     7         5         62   706.328700   -67.995310     9.047969   
5     7         6         62   882.444500   -87.075830    10.673920   
6     7         7         62   706.259600   -67.769160     9.366320   
7     7         8         62   355.401700   -29.832250     6.651977   
8     7         9         62     5.669636     9.041898     2.168643   
9     7        10         12     5.663535     9.039821     2.166068   
10    7        11         22  1469.277000  1475.238000  1468.002000   

            G4          G5         G6  
0    -67.46422   -22.76504  -142.4664  
1   1397.88500  1450.74000  1328.4590  
2    -67.48421   -22.68970  -142.5102  
3    -68.16720   -22.94553  -134.2837  
4    -68.53533   -22.72636  -128.2633  
5    -68.89483   -22.90121  -129.0006  
6    -68.86677   -22.81106  -127.4094  
7    -68.27580   -22.86239  -133.6061  
8    -67.44511   -22.82177  -142.4721  
9    -67.44169   -22.79920  -142.4393  
10  1397.89300  1450.75400  1328.5100

I am using pandas.groupby function to group the dataframe by the columns 'alpha' and 'beta'. When I run the function, the output of the groups looks as intended, with indexing order preserved.

b = a.groupby(['alpha','beta'])

b.groups
Out[18]: 
{(7, 1): Index(['0'], dtype='object'),
 (7, 2): Index(['1'], dtype='object'),
 (7, 3): Index(['2'], dtype='object'),
 (7, 4): Index(['3'], dtype='object'),
 (7, 5): Index(['4'], dtype='object'),
 (7, 6): Index(['5'], dtype='object'),
 (7, 7): Index(['6'], dtype='object'),
 (7, 8): Index(['7'], dtype='object'),
 (7, 9): Index(['8'], dtype='object'),
 (7, 10): Index(['9'], dtype='object'),
 (7, 11): Index(['10'], dtype='object')}

However, the function I am using in the future requires the values inside the tuples of this groupby function to be strings, not integers. Thus I simply change the indicies in 'a' to be strings instead of integers through the following:

a[['alpha','beta']] = a[['alpha','beta']].astype(str)

I then perform the same function as previous to group the dataframe by 'alpha' and 'beta'. HOWEVER, when I do this, the order of indexing is no longer preserved:

b = a.groupby(['alpha','beta'])

b.groups
Out[26]: 
{('7', '1'): Index(['0'], dtype='object'),
 ('7', '10'): Index(['9'], dtype='object'),
 ('7', '11'): Index(['10'], dtype='object'),
 ('7', '2'): Index(['1'], dtype='object'),
 ('7', '3'): Index(['2'], dtype='object'),
 ('7', '4'): Index(['3'], dtype='object'),
 ('7', '5'): Index(['4'], dtype='object'),
 ('7', '6'): Index(['5'], dtype='object'),
 ('7', '7'): Index(['6'], dtype='object'),
 ('7', '8'): Index(['7'], dtype='object'),
 ('7', '9'): Index(['8'], dtype='object')}

As you can see, some of the index locations have "flip-flopped". I have tried forcing the groupby input sort=False, but it returns the same thing. On one note, I would like to know how to stop this from happening, but I am also curious as to WHY this is happening (trying to understand pandas more). Is there a hierarchical order that pandas is looking for when the dtype is an 'object' instead of an 'integer'?

Thanks in advance.

jezrael · Accepted Answer

Reason is groupby sorting by default, so for avoid it need:

b = a.groupby(['alpha','beta'], sort=False)

Sorting stings is lexicographically, so it seems get flip-flopped values.

EDIT: Unfortunately it does not help and reason is because dictionary under python 3.6 are not orderable.

Pandas groupby indexing preservation

Answers (2)

Related Questions