Reputation: 509
I have a multiindex dataframe, I need to get highest and lowest scores as a dataframe
score
date asset
01-01-16 XO 8
VXO 3
NP 4
GE 2
LE 1
import pandas as pd
pdf =pd.DataFrame({'date':['01-01-16','01-01-16','01-01-16','01-01-16','01-01-16'],'asset':["XO","VXO","NP","GE","LE"],'score':[8,3,4,2,1]})
cdf = pdf.set_index(['date', 'asset'])
cdf
I tried cdf[(cdf.score < 2) & (cdf.score >4) ] which returns no values.
Expected output:
score
date asset
01-01-16 XO 8
LE 1
Upvotes: 1
Views: 571
Reputation: 863731
Use DataFrame.iloc
for select by positions - first and last row:
cdf = cdf.sort_values('score', ascending=False).iloc[[0, -1]]
print (cdf)
score
date asset
01-01-16 XO 8
LE 1
EDIT - You can also select index values by Series.idxmax
and
Series.idxmin
:
cdf = cdf.loc[[cdf.score.idxmax(), cdf.score.idxmin()]]
print (cdf)
score
date asset
01-01-16 XO 8
LE 1
If possible multiple max and min values:
cdf = cdf[(cdf.score == cdf.score.max()) | (cdf.score == cdf.score.min()) ]
print (cdf)
score
date asset
01-01-16 XO 8
LE 1
Upvotes: 1