Reputation: 341
Consider the data:
set.seed(123)
x <- rbinom(12, 1, .5)
y <- (x==0) * rexp(12, 1/100)
z <- (x==1) * rexp(12, 1/220)
group <- sample( rep(1:2, each=6) )
d <- data.frame(x, y, z, group)
Sorting the data first by y
, then by z
d <- d[order(d$y,d$z),]
Now within each group, I want to give rank . The following codes work correctly:
ds <- split(d, d$group)
ds1 <- ds[[1]]
ds1$rank <- 1:nrow(ds1)
ds2 <- ds[[2]]
ds2$rank <- 1:nrow(ds2)
But without splitting the data frame, I want to rank within each group. How can I do that?
Upvotes: 4
Views: 3010
Reputation: 887038
Here is an option using base R
. We first order
the dataset based on 'group', 'y', 'z' columns, then use ave
to create the sequence by 'group'
d1 <- d[do.call(order, d[c("group", "y", "z")]),]
row.names(d1) <- NULL
d1$rank <- with(d1, ave(seq_along(group), group, FUN = seq_along))
d1
# x y z group rank
#1 1 0.00000 6.988904 1 1
#2 1 0.00000 329.283431 1 2
#3 1 0.00000 353.287515 1 3
#4 0 35.51413 0.000000 1 4
#5 0 47.87604 0.000000 1 5
#6 0 272.62365 0.000000 1 6
#7 1 0.00000 212.491666 2 1
#8 1 0.00000 257.076377 2 2
#9 1 0.00000 326.760675 2 3
#10 1 0.00000 889.022577 2 4
#11 0 48.02147 0.000000 2 5
#12 0 84.97861 0.000000 2 6
Upvotes: 2
Reputation: 2621
dplyr
way:
library(dplyr)
d %>%
arrange(group, y, z) %>%
group_by(group) %>%
mutate(rank = 1:n()) %>%
ungroup()
We first sort the data.frame by group
then y
and then z
, then group it by group
and then assign the rank for each observation.
Result:
# A tibble: 12 × 5
x y z group rank
<int> <dbl> <dbl> <int> <int>
1 1 0.00000 6.988904 1 1
2 1 0.00000 329.283431 1 2
3 1 0.00000 353.287515 1 3
4 0 35.51413 0.000000 1 4
5 0 47.87604 0.000000 1 5
6 0 272.62365 0.000000 1 6
7 1 0.00000 212.491666 2 1
8 1 0.00000 257.076377 2 2
9 1 0.00000 326.760675 2 3
10 1 0.00000 889.022577 2 4
11 0 48.02147 0.000000 2 5
12 0 84.97861 0.000000 2 6
Upvotes: 4