Reputation: 11833
I have a dataframe like below:
df = data.frame(a = runif(10,0,10),
b = runif(10,1,10),
c = runif(10,0,12))
How can I find the n largest values from this dataframe? We can easily find top n from a vector. Is there any good way to find the top n from a dataframe? Thanks a lot.
Upvotes: 3
Views: 4561
Reputation: 27090
I suspect you're looking for slice_max()
.
Given, for example, the data below:
> df = data.frame(a = runif(5,0,10),
+ b = runif(5,1,10),
+ c = runif(5,-1,9))
> df
a b c
1 1.953615 6.663370 6.95084517
2 1.564794 2.376268 1.46826979
3 5.052276 3.609657 0.84467786
4 3.800541 5.506710 5.64018236
5 9.823815 9.158154 -0.03483406
We can get the three topmost rows (defined by the parameter n
) sorted by the column a
...
> slice_max(df, n=3, order_by=a)
a b c
1 9.823815 9.158154 -0.03483406
2 5.052276 3.609657 0.84467786
3 3.800541 5.506710 5.64018236
...column b
...
> slice_max(df, n=3, order_by=b)
a b c
1 9.823815 9.158154 -0.03483406
2 1.953615 6.663370 6.95084517
3 3.800541 5.506710 5.64018236
...or column c
:
> slice_max(df, n=3, order_by=c)
a b c
1 1.953615 6.663370 6.950845
2 3.800541 5.506710 5.640182
3 1.564794 2.376268 1.468270
Upvotes: 0
Reputation: 953
You can use tidyr::gather()
and dplyr::top_n()
.
First gather every column in one column using gather(key, value)
, and filter top n elements using top_n()
. For example, top-5.
library(tidyverse) # dplyr and tidyr
set.seed(10)
mydf <-
data.frame(a = runif(10,0,10),
b = runif(10,1,10),
c = runif(10,0,12))
In gather()
, freely specify the name of key
and value
.
You should name wt
of top_n()
as value
you have given.
mydf %>%
gather(key = "key", value = "value") %>%
top_n(5, wt = value) %>%
arrange(desc(value)) # sort by value
#> key value
#> 1 c 10.38
#> 2 c 10.06
#> 3 c 9.30
#> 4 c 9.25
#> 5 b 8.53
You can get the output of top_n values with corresponding column names.
However, if you just want only values, you can use unlist()
.
unlist(mydf) %>% # optionally, use.names = FALSE
sort(decreasing = TRUE) %>%
.[1:5]
#> c1 c7 c3 c9 b10
#> 10.38 10.06 9.30 9.25 8.53
Upvotes: 1
Reputation: 323376
Maybe you can check for stack
N=2
sort(stack(df)$values, decreasing=TRUE)[1:N]
[1] 10.884644 9.912067
Upvotes: 1
Reputation: 389275
unlist
and convert it into a vector, sort
them and find top values. So for top 2 values we can do
tail(sort(unlist(df, use.names = FALSE)), 2)
#[1] 9.581705 9.591726
If it's a matrix you'll not require unlist
tail(sort(as.matrix(df)), 2)
data
set.seed(1233)
df = data.frame(a = runif(10,0,10),
b = runif(10,1,10),
c = runif(10,0,12))
Upvotes: 0