Reputation: 607
I have run across similar question, but have not been able to find an answer for my specific needs.
I have a data set with a nested group design and I need to: i) sequentially number values in each nested group and ii) count the number of values within each nested group...or determine the length of each nested group.
While I regularly conduct this type of data wrangling, the structure of the data set is giving me fits, as the values in one of the groups is repeated both within and between higher level groups.
Below I have provided an example data set (df) and what the results should look like.
I tried the code below, but what ends up happening is that “seq” for each “region” within each “sample” are numbered sequentially and I need the numbering to restart from 1 when the “region” sequence is broken. See the “seq” for “region” S3.2 and S3.1 in the results table for "sample" “a”…hopefully this will illustrate what I am trying to articulate. Note that I have not tried to calculate "length" as I have not figured the issues with "seq".
df$seq <- ave(df$time,
df$sample, df$region,
FUN = function(x) seq_along(x))
While I am open to all approaches, I prefer solutions using R base package as editing older code every time a package is updated with changed functions (e.g. dplyr) is getting a bit annoying.
Thanks in advance.
df <- read.table(text = "sample time region
a 10 S1
a 11 S1
a 12 S1
a 13 S3.2
a 14 S3.1
a 15 S3.2
a 16 S3.2
a 17 S3.1
a 18 S9.2
a 19 S9.2
a 20 S9.2
a 21 S9.2
a 22 S9.2
a 23 S3.2
a 24 S3.2
a 25 S3.2
a 26 S3.1
b 3 S9.2
b 4 S9.2
b 5 S9.2
b 6 S3.2
b 7 S3.2
b 8 S9.2", header = TRUE)
result <- read.table(text = "sample time region seq length
a 10 S1 1 3
a 11 S1 2 3
a 12 S1 3 3
a 13 S3.2 1 1
a 14 S3.1 1 1
a 15 S3.2 1 2
a 16 S3.2 2 2
a 17 S3.1 1 1
a 18 S9.2 1 5
a 19 S9.2 2 5
a 20 S9.2 3 5
a 21 S9.2 4 5
a 22 S9.2 5 5
a 23 S3.2 1 3
a 24 S3.2 2 3
a 25 S3.2 3 3
a 26 S3.1 1 1
b 3 S9.2 1 3
b 4 S9.2 2 3
b 5 S9.2 3 3
b 6 S3.2 1 2
b 7 S3.2 2 2
b 8 S9.2 1 1", header = TRUE)
Upvotes: 0
Views: 136
Reputation: 887951
We can also do
library(data.table)
setDT(df)[, seq := rowid(sample, rleid(region))]
Upvotes: 0
Reputation: 102880
A data.table
option
setDT(df)[,`:=`(seq = 1:.N,length = .N),.(sample,rleid(region))]
or
setDT(df)[,`:=`(seq = rle(.I),length = .N),.(sample,rleid(region))]
gives
sample time region seq length
1: a 10 S1 1 3
2: a 11 S1 2 3
3: a 12 S1 3 3
4: a 13 S3.2 1 1
5: a 14 S3.1 1 1
6: a 15 S3.2 1 2
7: a 16 S3.2 2 2
8: a 17 S3.1 1 1
9: a 18 S9.2 1 5
10: a 19 S9.2 2 5
11: a 20 S9.2 3 5
12: a 21 S9.2 4 5
13: a 22 S9.2 5 5
14: a 23 S3.2 1 3
15: a 24 S3.2 2 3
16: a 25 S3.2 3 3
17: a 26 S3.1 1 1
18: b 3 S9.2 1 3
19: b 4 S9.2 2 3
20: b 5 S9.2 3 3
21: b 6 S3.2 1 2
22: b 7 S3.2 2 2
23: b 8 S9.2 1 1
sample time region seq length
Upvotes: 1
Reputation: 389325
Use rle
:
df$length <- with(rle(paste(df$sample, df$region)), rep(lengths, lengths))
df
# sample time region length
#1 a 10 S1 3
#2 a 11 S1 3
#3 a 12 S1 3
#4 a 13 S3.2 1
#5 a 14 S3.1 1
#6 a 15 S3.2 2
#7 a 16 S3.2 2
#8 a 17 S3.1 1
#9 a 18 S9.2 5
#10 a 19 S9.2 5
#11 a 20 S9.2 5
#12 a 21 S9.2 5
#13 a 22 S9.2 5
#14 a 23 S3.2 3
#15 a 24 S3.2 3
#16 a 25 S3.2 3
#17 a 26 S3.1 1
#18 b 3 S9.2 3
#19 b 4 S9.2 3
#20 b 5 S9.2 3
#21 b 6 S3.2 2
#22 b 7 S3.2 2
#23 b 8 S9.2 1
Upvotes: 1