Reputation: 2645
I have a dataframe with columns x1, x2, group
and I would like to generate a new dataframe with an extra column rank
that indicates the order of x1
in its group.
There is a related question here, but the accepted answer does not seem to work anymore.
Until here, it's fine:
library(dplyr)
data(iris)
by_species <- iris %>%
arrange(Species, Sepal.Length) %>%
group_by(Species)
But when I try to get the ranks by group:
by_species <- mutate(by_species, rank=row_number())
The error is:
Error in rank(x, ties.method = "first", na.last = "keep") :
argument "x" is missing, with no default
Update
The problem was some conflict between dplyr
and plyr
. To reproduce the error, load both packages:
library(dplyr)
library(plyr)
data(iris)
by_species <- iris %>%
arrange(Species, Sepal.Length) %>%
group_by(Species) %>%
mutate(rank=row_number())
# Error in rank(x, ties.method = "first", na.last = "keep") :
# argument "x" is missing, with no default
Unloading plyr
it works as it should:
detach("package:plyr", unload=TRUE)
by_species <- iris %>%
arrange(Species, Sepal.Length) %>%
group_by(Species) %>%
mutate(rank=row_number())
by_species %>% filter(rank <= 3)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species rank
## (dbl) (dbl) (dbl) (dbl) (fctr) (int)
## 1 4.3 3.0 1.1 0.1 setosa 1
## 2 4.4 2.9 1.4 0.2 setosa 2
## 3 4.4 3.0 1.3 0.2 setosa 3
## 4 4.9 2.4 3.3 1.0 versicolor 1
## 5 5.0 2.0 3.5 1.0 versicolor 2
## 6 5.0 2.3 3.3 1.0 versicolor 3
## 7 4.9 2.5 4.5 1.7 virginica 1
## 8 5.6 2.8 4.9 2.0 virginica 2
## 9 5.7 2.5 5.0 2.0 virginica 3
Upvotes: 29
Views: 47977
Reputation: 107767
For future readers, the rank by group variable can be achieved using base R. Per the OP's iris
data example to rank according to Sepal.Length
:
# ORDER BY SPECIES AND SEPAL.LENGTH
iris <- iris[with(iris, order(Species, Sepal.Length)), ]
# RUN A ROW COUNT FOR RANK BY SPECIES GROUP
iris$rank <- sapply(1:nrow(iris),
function(i) sum(iris[1:i, c('Species')]==iris$Species[i]))
# FILTER DATA FRAME BY TOP 3
iris <- iris[iris$rank <= 3,]
Upvotes: 4
Reputation: 5532
The following produces the desired result as was specified.
library(dplyr)
by_species <- iris %>% arrange(Species, Sepal.Length) %>%
group_by(Species) %>%
mutate(rank = rank(Sepal.Length, ties.method = "first"))
by_species %>% filter(rank <= 3)
##Source: local data frame [9 x 6]
##Groups: Species [3]
##
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species rank
## (dbl) (dbl) (dbl) (dbl) (fctr) (int)
##1 4.3 3.0 1.1 0.1 setosa 1
##2 4.4 2.9 1.4 0.2 setosa 2
##3 4.4 3.0 1.3 0.2 setosa 3
##4 4.9 2.4 3.3 1.0 versicolor 1
##5 5.0 2.0 3.5 1.0 versicolor 2
##6 5.0 2.3 3.3 1.0 versicolor 3
##7 4.9 2.5 4.5 1.7 virginica 1
##8 5.6 2.8 4.9 2.0 virginica 2
##9 5.7 2.5 5.0 2.0 virginica 3
by_species %>% slice(1:3)
##Source: local data frame [9 x 6]
##Groups: Species [3]
##
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species rank
## (dbl) (dbl) (dbl) (dbl) (fctr) (int)
##1 4.3 3.0 1.1 0.1 setosa 1
##2 4.4 2.9 1.4 0.2 setosa 2
##3 4.4 3.0 1.3 0.2 setosa 3
##4 4.9 2.4 3.3 1.0 versicolor 1
##5 5.0 2.0 3.5 1.0 versicolor 2
##6 5.0 2.3 3.3 1.0 versicolor 3
##7 4.9 2.5 4.5 1.7 virginica 1
##8 5.6 2.8 4.9 2.0 virginica 2
##9 5.7 2.5 5.0 2.0 virginica 3
Upvotes: 42