Carmen
Carmen

Reputation: 793

How to find the n-smallest values per column in a pandas dataframe

I have a dataframe with 74 columns and 1000 rows. I'd like to find the 20 smallest value(s) per column, calculate the mean of these 20 values and return the result as a transposed dataframe with one column and 74 rows

               1               2               3    
A         2013918.153207  2010286.148942  2010903.782339  
B         1694927.195604  1648518.272357  1665890.462014     
C         1548895.121455  1594033.016024  1589820.170989   

Is there a simple way to do this in Python?

Upvotes: 2

Views: 1812

Answers (1)

jezrael
jezrael

Reputation: 862661

You can use nsmallest with mean what working with Series (columns), so need apply:

print (df.apply(lambda x: x.nsmallest(2).mean()).to_frame('val'))
            val
1  1.621911e+06
2  1.621276e+06
3  1.627855e+06

Numpy solution:

First convert to numpy array, sort by columns, select rows and get mean. Last use DataFrame constructor:

arr = df.values
arr.sort(axis=0)
print (arr)
[[ 1548895.121455  1594033.016024  1589820.170989]
 [ 1694927.195604  1648518.272357  1665890.462014]
 [ 2013918.153207  2010286.148942  2010903.782339]]

print (np.mean(arr[:2,:], axis=0))
[ 1621911.1585295  1621275.6441905  1627855.3165015]

print (pd.DataFrame({'val':np.mean(arr[:2,:], axis=0)}, index=df.columns))
            val
1  1.621911e+06
2  1.621276e+06
3  1.627855e+06

Upvotes: 1

Related Questions