Reputation: 607
This is an extension of a previous question that I have asked. Unfortunately, I have not successfully modified previous code to fit this new requirement.
I have a data set with a nested group design and need to sequentially count sub-groups. The example data frame below illustrates the structure of my data set. GROUP_LENGTH
is the number of rows within the same GROUP
and note that the rows within these groups are counted sequentially (i.e., a1
and c2
occur multiple times). GROUP_LENGTH
sequentially numbers the rows within each GROUP_LENGTH
.
My issues arise with SEQ_SAMPLE
. My current code sequentially numbers the GROUP
for the whole data set, but I need the code to sequentially number the GROUP
within each SITE
and SAMPLE
(see Result
dataframe).
I have tried variations of this code using ave
, which has not worked. I also see how to use a for
loop, but prefer to avoid this if possible as I have a large data set. I am also open to all solutions, but prefer base package
solutions as I want to avoid having to load packages.
Thanks in advance for your help.
df <- read.table(text = "SITE SAMPLE TIME GROUP
a 1 1 a1
a 1 2 a2
a 1 3 a2
a 1 4 a3
a 2 1 b1
a 2 2 b1
a 2 3 b1
a 2 4 b3
a 2 5 b3
a 2 6 b3
b 1 1 c1
b 1 2 c2
b 1 3 c3
b 1 4 c4
b 1 5 c4
b 2 1 d1
b 2 2 d2
b 2 3 d3
", header = TRUE)
result <- read.table(text = "SITE SAMPLE TIME GROUP GROUP_LENGTH GROUP_SEQ SEQ_SAMPLE
a 1 1 a1 1 1 1
a 1 2 a2 2 1 2
a 1 3 a2 2 2 2
a 1 4 a3 1 1 3
a 2 1 b1 3 1 1
a 2 2 b1 3 2 1
a 2 3 b1 3 3 1
a 2 4 b3 3 1 2
a 2 5 b3 3 2 2
a 2 6 b3 3 3 2
b 1 1 c1 1 1 1
b 1 2 c2 1 1 2
b 1 3 c3 1 1 3
b 1 4 c4 2 1 4
b 1 5 c4 2 2 4
b 2 1 d1 1 1 1
b 2 2 d2 1 2 2
b 2 3 d3 1 3 3
", header = TRUE)
# First two lines of the code work fine....but it is the last line where the bug lines...
df$GROUP_LENGTH = with(rle(paste(df$SAMPLE, df$GROUP)), rep(lengths, lengths))
df$GROUP_SEQ = sequence(rle(paste(df$SAMPLE, df$GROUP))$lengths)
df$SEQ_SAMPLE = with(rle(paste(df$SITE, df$SAMPLE, df$GROUP)), rep(seq_along(values),lengths))
Upvotes: 1
Views: 323
Reputation: 26236
A tidyverse solution, which is very easy as it is direct application of three basic functions in dplyr,
n()
returning count of rowsrow_number()
which numbers the rows requentiallydense_rank()
which provides each element an identitylibrary(dplyr)
df %>% group_by(GROUP) %>%
mutate(GROUP_LENGTH = n(),
GROUP_SEQ = row_number()) %>%
group_by(SITE, SAMPLE) %>%
mutate(SEQ_SAMPLE = dense_rank(GROUP)) %>%
ungroup()
# A tibble: 18 x 7
SITE SAMPLE TIME GROUP GROUP_LENGTH GROUP_SEQ SEQ_SAMPLE
<chr> <int> <int> <chr> <int> <int> <int>
1 a 1 1 a1 1 1 1
2 a 1 2 a2 2 1 2
3 a 1 3 a2 2 2 2
4 a 1 4 a3 1 1 3
5 a 2 1 b1 3 1 1
6 a 2 2 b1 3 2 1
7 a 2 3 b1 3 3 1
8 a 2 4 b3 3 1 2
9 a 2 5 b3 3 2 2
10 a 2 6 b3 3 3 2
11 b 1 1 c1 1 1 1
12 b 1 2 c2 1 1 2
13 b 1 3 c3 1 1 3
14 b 1 4 c4 2 1 4
15 b 1 5 c4 2 2 4
16 b 2 1 d1 1 1 1
17 b 2 2 d2 1 1 2
18 b 2 3 d3 1 1 3
Upvotes: 1
Reputation: 389355
You may use ave
and create unique number for GROUP
within each SITE
and SAMPLE
.
df$SEQ_SAMPLE = with(df, as.integer(ave(GROUP, SITE, SAMPLE,
FUN = function(x) with(rle(x), rep(seq_along(values), lengths)))))
identical(df$SEQ_SAMPLE, result$SEQ_SAMPLE)
#[1] TRUE
Upvotes: 3