Reputation: 82
I have a dataframe of classes, sorted by the number of periods or blocks in a day. I'd like to have another variable that shows that every group of classes as a series, but only if they are one after another. So if there are two math classes in period 4 and 5, that would be one group, while the math in period 7 and 8 would be a different group. I'm interested in a dplyr method, but other methods will work as well.
I've tried to do group_by with mutate, but I'm missing a step.
df <- data.frame(
period = c(1:8),
classes = c("hist", "hist", "hist",
"math", "math",
"physics",
"math", "math")
)
I want the following output:
df <- data.frame(
period = c(1:8),
classes = c("hist", "hist", "hist",
"math", "math",
"physics",
"math", "math")
series = c(1, 1, 1, 2, 2, 3, 4, 4)
)
Upvotes: 2
Views: 149
Reputation: 18661
We can also use rleid
from data.table
:
library(data.table)
setDT(df)[,series := rleid(classes)]
In a dplyr
pipe:
library(dplyr)
df %>%
mutate(series = data.table::rleid(classes))
Output:
period classes series
1: 1 hist 1
2: 2 hist 1
3: 3 hist 1
4: 4 math 2
5: 5 math 2
6: 6 physics 3
7: 7 math 4
8: 8 math 4
Upvotes: 3
Reputation: 10996
One naive approach could be using a for loop
series = rep(1,nrow(df))
for (i in 2:nrow(df))
{
same = identical(df$classes[i-1], df$classes[i])
series[i] = ifelse(same == T, series[i-1], series[i-1]+1)
}
df$series = series
Upvotes: 0
Reputation: 6106
You need to use rle()
rle_length <- rle(as.character(df$classes))$length
df$series <- rep(seq(1:length(rle_length)),rle_length)
> df
period classes series
1 1 hist 1
2 2 hist 1
3 3 hist 1
4 4 math 2
5 5 math 2
6 6 physics 3
7 7 math 4
8 8 math 4
>
Upvotes: 2