Reputation: 784
I have the following dataframe:
Signature Genes Labels Scores Annotation
CELF1 AARS 0 -5.439356884 EMPTY
CELF1 AATF 0 -5.882719549 EMPTY
CELF1 ABCF1 0 -6.011462342 EMPTY
HNRNPC AARS 0 -6.166240409 EMPTY
HNRNPC AATF 0 -6.432658981 EMPTY
HNRNPC ABCF1 0 -6.476526092 EMPTY
FUS AARS 0 -5.646015964 EMPTY
FUS AATF 0 -6.224914841 EMPTY
FUS ABCF1 0 -6.395334389 EMPTY
I want to rank my 'Scores' Column based on in a Signature column rank 'Genes' based on Scores column such that
Signature Genes Labels Scores Annotation Rank
CELF1 AARS 0 -5.439356884 EMPTY 1
CELF1 AATF 0 -5.882719549 EMPTY 2
CELF1 ABCF1 0 -6.011462342 EMPTY 3
HNRNPC AARS 0 -6.166240409 EMPTY 1
HNRNPC AATF 0 -6.432658981 EMPTY 2
HNRNPC ABCF1 0 -6.476526092 EMPTY 3
FUS AARS 0 -5.646015964 EMPTY 1
FUS AATF 0 -6.224914841 EMPTY 2
FUS ABCF1 0 -6.395334389 EMPTY 3
I followed based on this post. My code was something like this:
data=pd.read_csv("trial1.csv",sep='\t')
data['max_score'] = data.groupby(['Signature','Genes'])['Scores'].transform('max').astype(float)
data['rank']=data.groupby('Signature')['max_score'].rank()
However my Scores get ranked based on the absolute values, as follows:
Signature Genes Labels Scores Annotation Rank
CELF1 ABCF1 0 -6.011462342 EMPTY 1
CELF1 AATF 0 -5.882719549 EMPTY 2
CELF1 AARS 0 -5.439356884 EMPTY 3
HNRNPC ABCF1 0 -6.476526092 EMPTY 1
HNRNPC AATF 0 -6.432658981 EMPTY 2
HNRNPC AARS 0 -6.166240409 EMPTY 3
FUS ABCF1 0 -6.395334389 EMPTY 1
FUS AATF 0 -6.224914841 EMPTY 2
FUS AARS 0 -5.646015964 EMPTY 3
Upvotes: 1
Views: 167
Reputation: 5460
Rank isn't sorting by absolute value. It's sorting by ascending order, which is its default. You simply need to change your call to rank()
to be rank(ascending=False)
. See the documentation.
Upvotes: 2