Reputation: 294
I want to generate a new variable in a dataset. This variable should count the occurence of values in different groups, defined by another variable.
Here an example dataframe:
x <- c(1, 1, 2, 3, 3, 3, 4, 4)
y <- c(5, 4, 4, 5, 5, 5, 1, 1)
dat <- data.frame(x, y)
dat
x y
1 1 5
2 1 4
3 2 4
4 3 5
5 3 5
6 3 5
7 4 1
8 4 1
Now i want to generate a new variable, let's call it z. z should count the occurence of duplicates in y by groups (groups defined by x: 1, 2, 3, 4). Therefore, the result should look like this:
x y z
1 1 5 1
2 1 4 1
3 2 4 1
4 3 5 1
5 3 5 2
6 3 5 3
7 4 1 1
8 4 1 2
Is there a way to do that with dplyr?
Upvotes: 3
Views: 708
Reputation: 887048
An option is to do a group by and create a sequence column
library(dplyr)
dat %>%
group_by(x, y) %>%
mutate(z = row_number())
# A tibble: 8 x 3
# Groups: x, y [5]
# x y z
# <dbl> <dbl> <int>
#1 1 5 1
#2 1 4 1
#3 2 4 1
#4 3 5 1
#5 3 5 2
#6 3 5 3
#7 4 1 1
#8 4 1 2
Also with base R
dat$z <- with(dat, ave(seq_along(x), x, y, FUN = seq_along))
Or with data.table
library(data.table)
setDT(dat)[, z := seq_len(.N), .(x, y)]
Or more compactly
setDT(dat)[, z := rowid(x, y)]
Upvotes: 3
Reputation: 39858
One possibility could be:
dat %>%
group_by(x) %>%
mutate(z = cumsum(duplicated(y)) + 1)
x y z
<dbl> <dbl> <dbl>
1 1 5 1
2 1 4 1
3 2 4 1
4 3 5 1
5 3 5 2
6 3 5 3
7 4 1 1
8 4 1 2
The same with base R
:
with(dat, ave(y, x, FUN = function(x) cumsum(duplicated(x)) + 1))
Upvotes: 2