Reputation: 93
This is a very simple problem, but I have the following data:
> head(Session_numbers)
ID Session
1 1 1_43392
2 1 1_43392
3 1 1_43392
4 1 1_43394
5 1 1_43394
6 1 1_43394
7 1 1_43398
8 1 1_43401
9 2 2_44502
10 2 2_44502
where ID is the grouping label per subject, and every row has a session code, which corresponds to points in time. I want to number the session codes sequentially in a variable 'Snum' so that each identical session code per ID gets grouped and is given the same number, such as:
ID Session Snum
1 1 1_43392 1
2 1 1_43392 1
3 1 1_43392 1
4 1 1_43394 2
5 1 1_43394 2
6 1 1_43394 2
7 1 1_43398 3
8 1 1_43401 4
9 2 2_44502 1
10 2 2_44502 1
The number of Sessions per ID differs, and every Session code is unique.
I have tried to use ave
, dplyr
and data.table
but I just can't seem to get it right, e.g.:
DT <- data.table(Session_numbers)
DT[, Snum := seq_len(.N), by = list(ID, Session)]
> head(DT)
ID Session Snum
1: 1 1_43392 1
2: 1 1_43392 2
3: 1 1_43392 3
4: 1 1_43394 1
5: 1 1_43394 2
6: 1 1_43394 3
Or using dplyr
, with the following code which gives me an error message:
> Session_numbers %>%
+ group_by(ID, Session) %>%
+ mutate(Snum = row_number())
Error: row_number() should only be called in a data context
Call `rlang::last_error()` to see a backtrace
or with ave
head(Session_numbers)
ID Session num
1 1 1_43392 1
2 1 1_43392 2
3 1 1_43392 3
4 1 1_43394 1
5 1 1_43394 2
6 1 1_43394 3
My question is similar to this one: Count number of observations/rows per group and add result to data frame
What am I missing?
Upvotes: 2
Views: 2836
Reputation: 887511
Here is an option where we group by 'ID' and match
the 'Session' with unique
elements of 'Session' to get the index
library(dplyr)
Session_numbers %>%
group_by(ID) %>%
mutate(num = match(Session, unique(Session)))
# A tibble: 10 x 3
# Groups: ID [2]
# ID Session num
# <int> <chr> <int>
# 1 1 1_43392 1
# 2 1 1_43392 1
# 3 1 1_43392 1
# 4 1 1_43394 2
# 5 1 1_43394 2
# 6 1 1_43394 2
# 7 1 1_43398 3
# 8 1 1_43401 4
# 9 2 2_44502 1
#10 2 2_44502 1
Or using base R
Session_numbers$num <- with(Session_numbers, ave(Session, ID, FUN =
function(x) match(x, unique(x))))
If we are interested in changing the non-adjacent elements to new number, then
library(data.table)
...
%>% mutate(num = rleid(Session))
Session_numbers <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L),
Session = c("1_43392", "1_43392", "1_43392", "1_43394", "1_43394",
"1_43394", "1_43398", "1_43401", "2_44502", "2_44502")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
Upvotes: 1
Reputation: 40131
Another dplyr
option could be:
df %>%
group_by(ID) %>%
mutate(Snum = cumsum(!duplicated(Session)))
ID Session Snum
<int> <chr> <int>
1 1 1_43392 1
2 1 1_43392 1
3 1 1_43392 1
4 1 1_43394 2
5 1 1_43394 2
6 1 1_43394 2
7 1 1_43398 3
8 1 1_43401 4
9 2 2_44502 1
10 2 2_44502 1
Upvotes: 1