Counting rows in nested groups

Question

This is an extension of a previous question that I have asked. Unfortunately, I have not successfully modified previous code to fit this new requirement.

I have a data set with a nested group design and need to sequentially count sub-groups. The example data frame below illustrates the structure of my data set. GROUP_LENGTH is the number of rows within the same GROUP and note that the rows within these groups are counted sequentially (i.e., a1 and c2 occur multiple times). GROUP_LENGTH sequentially numbers the rows within each GROUP_LENGTH.

My issues arise with SEQ_SAMPLE. My current code sequentially numbers the GROUP for the whole data set, but I need the code to sequentially number the GROUP within each SITE and SAMPLE (see Result dataframe).

I have tried variations of this code using ave, which has not worked. I also see how to use a for loop, but prefer to avoid this if possible as I have a large data set. I am also open to all solutions, but prefer base package solutions as I want to avoid having to load packages. Thanks in advance for your help.

df <- read.table(text = "SITE   SAMPLE  TIME    GROUP
a   1   1   a1
a   1   2   a2
a   1   3   a2
a   1   4   a3
a   2   1   b1
a   2   2   b1
a   2   3   b1
a   2   4   b3
a   2   5   b3
a   2   6   b3
b   1   1   c1
b   1   2   c2
b   1   3   c3
b   1   4   c4
b   1   5   c4
b   2   1   d1
b   2   2   d2
b   2   3   d3
", header = TRUE)

result <- read.table(text = "SITE   SAMPLE  TIME    GROUP   GROUP_LENGTH    GROUP_SEQ   SEQ_SAMPLE
a   1   1   a1  1   1   1
a   1   2   a2  2   1   2
a   1   3   a2  2   2   2
a   1   4   a3  1   1   3
a   2   1   b1  3   1   1
a   2   2   b1  3   2   1
a   2   3   b1  3   3   1
a   2   4   b3  3   1   2
a   2   5   b3  3   2   2
a   2   6   b3  3   3   2
b   1   1   c1  1   1   1
b   1   2   c2  1   1   2
b   1   3   c3  1   1   3
b   1   4   c4  2   1   4
b   1   5   c4  2   2   4
b   2   1   d1  1   1   1
b   2   2   d2  1   2   2
b   2   3   d3  1   3   3
", header = TRUE)

# First two lines of the code work fine....but it is the last line where the bug lines...
df$GROUP_LENGTH = with(rle(paste(df$SAMPLE, df$GROUP)), rep(lengths, lengths))
df$GROUP_SEQ = sequence(rle(paste(df$SAMPLE, df$GROUP))$lengths)

df$SEQ_SAMPLE = with(rle(paste(df$SITE, df$SAMPLE, df$GROUP)), rep(seq_along(values),lengths))

Ronak Shah · Accepted Answer

You may use ave and create unique number for GROUP within each SITE and SAMPLE.

df$SEQ_SAMPLE = with(df, as.integer(ave(GROUP, SITE, SAMPLE, 
               FUN = function(x) with(rle(x), rep(seq_along(values), lengths)))))


identical(df$SEQ_SAMPLE, result$SEQ_SAMPLE)
#[1] TRUE

Counting rows in nested groups

Answers (2)

Related Questions