Reputation: 793
I have a dataframe with 74 columns and 1000 rows. I'd like to find the 20 smallest value(s) per column, calculate the mean of these 20 values and return the result as a transposed dataframe with one column and 74 rows
1 2 3
A 2013918.153207 2010286.148942 2010903.782339
B 1694927.195604 1648518.272357 1665890.462014
C 1548895.121455 1594033.016024 1589820.170989
Is there a simple way to do this in Python?
Upvotes: 2
Views: 1812
Reputation: 862661
You can use nsmallest
with mean
what working with Series
(columns), so need apply
:
print (df.apply(lambda x: x.nsmallest(2).mean()).to_frame('val'))
val
1 1.621911e+06
2 1.621276e+06
3 1.627855e+06
Numpy solution:
First convert to numpy array
, sort by columns, select rows and get mean
. Last use DataFrame
constructor:
arr = df.values
arr.sort(axis=0)
print (arr)
[[ 1548895.121455 1594033.016024 1589820.170989]
[ 1694927.195604 1648518.272357 1665890.462014]
[ 2013918.153207 2010286.148942 2010903.782339]]
print (np.mean(arr[:2,:], axis=0))
[ 1621911.1585295 1621275.6441905 1627855.3165015]
print (pd.DataFrame({'val':np.mean(arr[:2,:], axis=0)}, index=df.columns))
val
1 1.621911e+06
2 1.621276e+06
3 1.627855e+06
Upvotes: 1