Nico
Nico

Reputation: 373

Sort pandas DataFrame ignoring certain values

there is this pandas DataFrame with values close to 1 and close to 0:

df = pd.DataFrame({
'colA': (0.97, 0.88, 0.03, 0.02),
'colB': (0.01, 0.03, 0.87, 0.99),
})

Sorting it according to values gives (sorting forcolB has obviously no effect):

df.sort_values(['colA','colB'], ascending=False)
>>    colA  colB
>> 0  0.97  0.01
>> 1  0.88  0.03
>> 2  0.03  0.87
>> 3  0.02  0.99

However, I would like to sort based on only the larger values, say > 0.5. This would ignore the smaller values for colA and switch to colB for further sorting.

The sorted DataFrame would look like this (row 2 and 3 are switched):

df.some_function(['colA','colB'], ascending=False, condition=i>0.5)
>>    colA  colB
>> 0  0.97  0.01
>> 1  0.88  0.03
>> 2  0.02  0.99
>> 3  0.03  0.87

Thanks so much for your help!

Upvotes: 5

Views: 1743

Answers (3)

Kaushik J
Kaushik J

Reputation: 1072

filter the datafarme based on condition, then sort, then append

df1 = df.where(df['colA'] > 0.5).sort_values('colA')
df2 = df.where(df['colA'] <= 0.5).sort_values('colB')

final_frame = df1.append(df2).dropna()

   colA  colB
0  0.87  0.01
1  0.88  0.03
2  0.03  0.87
3  0.02  0.99

Upvotes: 0

jezrael
jezrael

Reputation: 863226

Idea is replace not matched values to missing values and then sorting, last change order by new index:

idx = (df[['colA','colB']].where(df[['colA','colB']] > 0.5)
           .sort_values(['colA','colB'], ascending=False).index)

df1 = df.loc[idx]
print (df1)
   colA  colB
0  0.97  0.01
1  0.88  0.03
3  0.02  0.99
2  0.03  0.87

Detail:

print (df[['colA','colB']].where(df[['colA','colB']] > 0.5))
   colA  colB
0  0.97   NaN
1  0.88   NaN
2   NaN  0.87
3   NaN  0.99


print (df[['colA','colB']].where(df[['colA','colB']] > 0.5)
                          .sort_values(['colA','colB'], ascending=False))
   colA  colB
0  0.97   NaN
1  0.88   NaN
3   NaN  0.99
2   NaN  0.87

Upvotes: 3

Roy2012
Roy2012

Reputation: 12523

Build a new column which is the same as 'a', but ignores smaller values, and sort using this new value and 'b':

df.assign(simplified_a = np.where(df.colA<0.5, 0, df.colA))\
  .sort_values(["simplified_a", "colB"], ascending=False).drop("simplified_a", axis=1)

Result:

   colA  colB
0  0.97  0.01
1  0.88  0.03
3  0.02  0.99
2  0.03  0.87

Upvotes: 1

Related Questions