Reputation: 21
I have the following dataframe which I obtained using: df.groupby(['departamento','campo']).describe()
df_statistics:
produccion
mean std min max
departamento campo
f7fd2c4f 8dd7c41b 4714.695603 1076.940951 3091.015553 6378.546534
82edafb9 1851.291482 841.512944 675.814722 3006.476183
58a0d8ca 1768.151315 347.896113 1033.459536 2242.544338
8ba362f3 257.917212 231.490925 0.000000 497.916659
4f4a249f 192.811711 80.299111 129.190598 356.437730
741abe20 431.717352 71.053604 291.831556 529.518332
51cbb05d 489.804186 65.542073 353.186216 582.869264
4d0fb45e 358.597250 30.166391 314.168045 407.842103
c98bd9dd 437.244383 27.135823 402.546159 481.245852
7eb34927 106.426374 22.579237 81.994706 142.283652
ec12ad00 44502c89 15.015145 11.467353 0.000000 29.241879
5558f26e 1.107400 0.959445 0.000000 2.762156
85c1a0e5 0.122720 0.425113 0.000000 1.472635
cf33cb8a 2f614c0b 12458.858168 12042.715975 150.635367 25999.977584
5559f8d7 4272.447078 1326.999765 2458.231739 6059.658900
fd6f6562 3378.712031 1194.101786 869.763739 4814.220212
febb6cf6 4149.936221 833.663173 2471.139924 5827.822674
d56beadb 474.831361 810.840341 0.000000 2283.465569
124207de 3863.484888 796.945367 2713.111304 5150.735620
1f d2689f 6099.963902 768.102604 4766.241346 7897.993261
c728bf96 3361.623457 704.293795 2203.721911 4949.989960
I have sorted the dataframe based on the standard deviation ('std') column, but I want to show only the top 5 values for each group in the column 'departamento'.
I tried the following code: df_statistics.nlargest(5, columns =('produccion','std'))
but I get the top 5 overall the groups in the column 'departamento':
produccion
mean std min max
departamento campo
cf33cb8a 2f614c0b 12458.858168 12042.715975 150.635367 25999.977584
5559f8d7 4272.447078 1326.999765 2458.231739 6059.658900
fd6f6562 3378.712031 1194.101786 869.763739 4814.220212
f7fd2c4f 8dd7c41b 4714.695603 1076.940951 3091.015553 6378.546534
82edafb9 1851.291482 841.512944 675.814722 3006.476183
How can I show the top 5 values for each group based on the column 'std'
Upvotes: 2
Views: 59
Reputation: 153460
IIUC,
df.groupby('departamento').head(5)
Output:
produccion
mean std min max
departamento campo
f7fd2c4f 8dd7c41b 4714.695603 1076.940951 3091.015553 6378.546534
82edafb9 1851.291482 841.512944 675.814722 3006.476183
58a0d8ca 1768.151315 347.896113 1033.459536 2242.544338
8ba362f3 257.917212 231.490925 0.000000 497.916659
4f4a249f 192.811711 80.299111 129.190598 356.437730
ec12ad00 44502c89 15.015145 11.467353 0.000000 29.241879
5558f26e 1.107400 0.959445 0.000000 2.762156
85c1a0e5 0.122720 0.425113 0.000000 1.472635
cf33cb8a 2f614c0b 12458.858168 12042.715975 150.635367 25999.977584
5559f8d7 4272.447078 1326.999765 2458.231739 6059.658900
fd6f6562 3378.712031 1194.101786 869.763739 4814.220212
febb6cf6 4149.936221 833.663173 2471.139924 5827.822674
d56beadb 474.831361 810.840341 0.000000 2283.465569
@recentadvance is correct,
df.sort_values(by=('produccion', 'std'), ascending=False)\
.groupby('departamento')\
.head(5)\
.sort_index()
Sort dataframe first, then groupby
with head
and sort_index
.
Upvotes: 1
Reputation: 177
Use another groupby
:
df_statistics.groupby('departamento')\
.apply(lambda grp: grp.nlargest(5, columns=('produccion', 'std')))
Upvotes: 1