Reputation: 607

Sequential numbering in nested groups with common values in R

I have run across similar question, but have not been able to find an answer for my specific needs.

I have a data set with a nested group design and I need to: i) sequentially number values in each nested group and ii) count the number of values within each nested group...or determine the length of each nested group.

While I regularly conduct this type of data wrangling, the structure of the data set is giving me fits, as the values in one of the groups is repeated both within and between higher level groups.

Below I have provided an example data set (df) and what the results should look like.

I tried the code below, but what ends up happening is that “seq” for each “region” within each “sample” are numbered sequentially and I need the numbering to restart from 1 when the “region” sequence is broken. See the “seq” for “region” S3.2 and S3.1 in the results table for "sample" “a”…hopefully this will illustrate what I am trying to articulate. Note that I have not tried to calculate "length" as I have not figured the issues with "seq".

df$seq <- ave(df$time,               
            df$sample, df$region,
            FUN = function(x) seq_along(x))

While I am open to all approaches, I prefer solutions using R base package as editing older code every time a package is updated with changed functions (e.g. dplyr) is getting a bit annoying.

Thanks in advance.

df <- read.table(text = "sample time region
a   10  S1
a   11  S1
a   12  S1
a   13  S3.2
a   14  S3.1
a   15  S3.2
a   16  S3.2
a   17  S3.1
a   18  S9.2
a   19  S9.2
a   20  S9.2
a   21  S9.2
a   22  S9.2
a   23  S3.2
a   24  S3.2
a   25  S3.2
a   26  S3.1
b   3   S9.2
b   4   S9.2
b   5   S9.2
b   6   S3.2
b   7   S3.2
b   8   S9.2", header = TRUE)

result <- read.table(text = "sample time region seq length
a   10  S1  1   3
a   11  S1  2   3
a   12  S1  3   3
a   13  S3.2    1   1
a   14  S3.1    1   1
a   15  S3.2    1   2
a   16  S3.2    2   2
a   17  S3.1    1   1
a   18  S9.2    1   5
a   19  S9.2    2   5
a   20  S9.2    3   5
a   21  S9.2    4   5
a   22  S9.2    5   5
a   23  S3.2    1   3
a   24  S3.2    2   3
a   25  S3.2    3   3
a   26  S3.1    1   1
b   3   S9.2    1   3
b   4   S9.2    2   3
b   5   S9.2    3   3
b   6   S3.2    1   2
b   7   S3.2    2   2
b   8   S9.2    1   1", header = TRUE)

Upvotes: 0

Answers (3)

akrun

Reputation: 887951

We can also do

library(data.table)
setDT(df)[, seq := rowid(sample, rleid(region))]

Upvotes: 0

ThomasIsCoding

Reputation: 102880

A data.table option

setDT(df)[,`:=`(seq = 1:.N,length = .N),.(sample,rleid(region))]

setDT(df)[,`:=`(seq = rle(.I),length = .N),.(sample,rleid(region))]

gives

    sample time region seq length
 1:      a   10     S1   1      3
 2:      a   11     S1   2      3
 3:      a   12     S1   3      3
 4:      a   13   S3.2   1      1
 5:      a   14   S3.1   1      1
 6:      a   15   S3.2   1      2
 7:      a   16   S3.2   2      2
 8:      a   17   S3.1   1      1
 9:      a   18   S9.2   1      5
10:      a   19   S9.2   2      5
11:      a   20   S9.2   3      5
12:      a   21   S9.2   4      5
13:      a   22   S9.2   5      5
14:      a   23   S3.2   1      3
15:      a   24   S3.2   2      3
16:      a   25   S3.2   3      3
17:      a   26   S3.1   1      1
18:      b    3   S9.2   1      3
19:      b    4   S9.2   2      3
20:      b    5   S9.2   3      3
21:      b    6   S3.2   1      2
22:      b    7   S3.2   2      2
23:      b    8   S9.2   1      1
    sample time region seq length

Upvotes: 1

Ronak Shah

Reputation: 389325

Use rle :

df$length <- with(rle(paste(df$sample, df$region)), rep(lengths, lengths))
df

#   sample time region length
#1       a   10     S1      3
#2       a   11     S1      3
#3       a   12     S1      3
#4       a   13   S3.2      1
#5       a   14   S3.1      1
#6       a   15   S3.2      2
#7       a   16   S3.2      2
#8       a   17   S3.1      1
#9       a   18   S9.2      5
#10      a   19   S9.2      5
#11      a   20   S9.2      5
#12      a   21   S9.2      5
#13      a   22   S9.2      5
#14      a   23   S3.2      3
#15      a   24   S3.2      3
#16      a   25   S3.2      3
#17      a   26   S3.1      1
#18      b    3   S9.2      3
#19      b    4   S9.2      3
#20      b    5   S9.2      3
#21      b    6   S3.2      2
#22      b    7   S3.2      2
#23      b    8   S9.2      1

Upvotes: 1

Sequential numbering in nested groups with common values in R

Answers (3)

Related Questions