AishwaryaKulkarni
AishwaryaKulkarni

Reputation: 784

Python pandas rank/sort based on group by of two columns column that differs for each input

I have the following dataframe:

Signature   Genes   Labels  Scores     Annotation  
 CELF1      AARS    0      -5.439356884 EMPTY     
 CELF1      AATF    0      -5.882719549 EMPTY     
 CELF1     ABCF1    0      -6.011462342 EMPTY     
HNRNPC      AARS    0      -6.166240409 EMPTY     
HNRNPC      AATF    0      -6.432658981 EMPTY   
HNRNPC     ABCF1    0      -6.476526092 EMPTY   
   FUS      AARS    0      -5.646015964 EMPTY   
   FUS      AATF    0      -6.224914841 EMPTY    
   FUS     ABCF1    0      -6.395334389 EMPTY     

I want to rank my 'Scores' Column based on in a Signature column rank 'Genes' based on Scores column such that

Signature   Genes   Labels  Scores     Annotation   Rank 
  CELF1     AARS    0    -5.439356884   EMPTY        1
  CELF1     AATF    0    -5.882719549   EMPTY        2
  CELF1    ABCF1    0    -6.011462342   EMPTY        3
  HNRNPC    AARS    0    -6.166240409   EMPTY        1
  HNRNPC    AATF    0    -6.432658981   EMPTY        2
  HNRNPC    ABCF1   0    -6.476526092   EMPTY        3
   FUS      AARS    0    -5.646015964   EMPTY        1
   FUS      AATF    0   -6.224914841    EMPTY        2
   FUS     ABCF1    0   -6.395334389    EMPTY        3

I followed based on this post. My code was something like this:

   data=pd.read_csv("trial1.csv",sep='\t')
   data['max_score'] = data.groupby(['Signature','Genes'])['Scores'].transform('max').astype(float)
   data['rank']=data.groupby('Signature')['max_score'].rank()

However my Scores get ranked based on the absolute values, as follows:

  Signature Genes   Labels  Scores       Annotation Rank 
   CELF1    ABCF1      0    -6.011462342    EMPTY    1
   CELF1    AATF       0    -5.882719549    EMPTY    2
   CELF1    AARS       0    -5.439356884    EMPTY    3
  HNRNPC    ABCF1      0    -6.476526092    EMPTY    1
  HNRNPC    AATF       0    -6.432658981    EMPTY    2
  HNRNPC    AARS       0    -6.166240409    EMPTY    3
   FUS      ABCF1      0    -6.395334389    EMPTY    1
   FUS       AATF      0    -6.224914841    EMPTY    2
   FUS       AARS      0    -5.646015964    EMPTY    3

Upvotes: 1

Views: 167

Answers (1)

PMende
PMende

Reputation: 5460

Rank isn't sorting by absolute value. It's sorting by ascending order, which is its default. You simply need to change your call to rank() to be rank(ascending=False). See the documentation.

Upvotes: 2

Related Questions