zesla
zesla

Reputation: 11833

Find the top n largest values from a dataframe (or matrix) in r

I have a dataframe like below:

df = data.frame(a = runif(10,0,10),
            b = runif(10,1,10),
            c = runif(10,0,12))

How can I find the n largest values from this dataframe? We can easily find top n from a vector. Is there any good way to find the top n from a dataframe? Thanks a lot.

Upvotes: 3

Views: 4561

Answers (4)

brandizzi
brandizzi

Reputation: 27090

I suspect you're looking for slice_max().

Given, for example, the data below:

> df = data.frame(a = runif(5,0,10),
+                 b = runif(5,1,10),
+                 c = runif(5,-1,9))
> df
         a        b           c
1 1.953615 6.663370  6.95084517
2 1.564794 2.376268  1.46826979
3 5.052276 3.609657  0.84467786
4 3.800541 5.506710  5.64018236
5 9.823815 9.158154 -0.03483406

We can get the three topmost rows (defined by the parameter n) sorted by the column a...

> slice_max(df, n=3, order_by=a)
         a        b           c
1 9.823815 9.158154 -0.03483406
2 5.052276 3.609657  0.84467786
3 3.800541 5.506710  5.64018236

...column b...

> slice_max(df, n=3, order_by=b)
         a        b           c
1 9.823815 9.158154 -0.03483406
2 1.953615 6.663370  6.95084517
3 3.800541 5.506710  5.64018236

...or column c:

> slice_max(df, n=3, order_by=c)
         a        b        c
1 1.953615 6.663370 6.950845
2 3.800541 5.506710 5.640182
3 1.564794 2.376268 1.468270

Upvotes: 0

younggeun
younggeun

Reputation: 953

You can use tidyr::gather() and dplyr::top_n().

First gather every column in one column using gather(key, value), and filter top n elements using top_n(). For example, top-5.

library(tidyverse) # dplyr and tidyr
set.seed(10)
mydf <- 
  data.frame(a = runif(10,0,10),
            b = runif(10,1,10),
            c = runif(10,0,12))

In gather(), freely specify the name of key and value.

You should name wt of top_n() as value you have given.

mydf %>% 
  gather(key = "key", value = "value") %>% 
  top_n(5, wt = value) %>% 
  arrange(desc(value)) # sort by value
#>   key value
#> 1   c 10.38
#> 2   c 10.06
#> 3   c  9.30
#> 4   c  9.25
#> 5   b  8.53

You can get the output of top_n values with corresponding column names.


However, if you just want only values, you can use unlist().

unlist(mydf) %>% # optionally, use.names = FALSE
  sort(decreasing = TRUE) %>% 
  .[1:5]
#>    c1    c7    c3    c9   b10 
#> 10.38 10.06  9.30  9.25  8.53

Upvotes: 1

BENY
BENY

Reputation: 323376

Maybe you can check for stack

N=2
sort(stack(df)$values, decreasing=TRUE)[1:N]
[1] 10.884644  9.912067

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389275

unlist and convert it into a vector, sort them and find top values. So for top 2 values we can do

tail(sort(unlist(df, use.names = FALSE)), 2)
#[1] 9.581705 9.591726

If it's a matrix you'll not require unlist

tail(sort(as.matrix(df)), 2)

data

set.seed(1233)
df = data.frame(a = runif(10,0,10),
                b = runif(10,1,10),
                c = runif(10,0,12))

Upvotes: 0

Related Questions