Reputation: 67
I have a dataframe which contains two columns, id column id
and an observation column val
with observations from a-g in random order.
data.frame(
id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L),
val = c("a", "b", "c", "d", "e", "f", "a", "b", "b", "a", "c", "e", "d", "a", "b","c","f","g")
)
id val
1 1 a
2 1 b
3 1 c
4 1 d
5 1 e
6 2 f
7 2 a
8 2 b
9 3 b
10 3 a
11 3 c
12 3 e
13 3 d
14 4 a
15 4 b
16 4 c
17 4 f
18 4 g
I want to assign an (cumulative) identifier valid
for each unique set of observations (belonging to each id
) given in column val
, like so:
id val valid
1 1 a 1
2 1 b 1
3 1 c 1
4 1 d 1
5 1 e 1
6 2 f 2
7 2 a 2
8 2 b 2
9 3 b 1
10 3 a 1
11 3 c 1
12 3 e 1
13 3 d 1
14 4 a 3
15 4 b 3
16 4 c 3
17 4 f 3
18 4 g 3
Upvotes: 4
Views: 93
Reputation: 52319
In tidyverse
.
val
ues by id
Group by
this new column, and use cur_group_id
to create the valid
columnUnnest
to get the desired outputlibrary(dplyr)
library(tidyr)
df |>
reframe(val = list(sort(val)), .by = id) |>
mutate(valid = cur_group_id(), .by = val) |>
unnest(val)
# # A tibble: 18 × 3
# id val valid
# <int> <chr> <int>
# 1 1 a 1
# 2 1 b 1
# 3 1 c 1
# 4 1 d 1
# 5 1 e 1
# 6 2 a 2
# 7 2 b 2
# 8 2 f 2
# 9 3 a 1
# 10 3 b 1
# 11 3 c 1
# 12 3 d 1
# 13 3 e 1
# 14 4 a 3
# 15 4 b 3
# 16 4 c 3
# 17 4 f 3
# 18 4 g 3
Upvotes: 3
Reputation: 102459
You can try the base R option like below
u <- with(df, by(val, id, sort))
dfout <- merge(
df,
setNames(
stack(setNames(match(u, unique(u)), names(u))),
c("valid", "id")
),
all = TRUE
)
and you will obtain
> dfout
id val valid
1 1 a 1
2 1 b 1
3 1 c 1
4 1 d 1
5 1 e 1
6 2 f 2
7 2 a 2
8 2 b 2
9 3 b 1
10 3 a 1
11 3 c 1
12 3 e 1
13 3 d 1
14 4 a 3
15 4 b 3
16 4 c 3
17 4 f 3
18 4 g 3
Upvotes: 4