Reputation: 908
I’m unable to find a good solution, but I think there might be even a base or tidyverse R function that could do that:
My Data:
Row | Label |
---|---|
1 | NA |
2 | Foo |
3 | Foo |
4 | Foo |
5 | NA |
6 | NA |
7 | Foo |
8 | Foo |
9 | NA |
10 | Foo |
11 | NA |
... | ... |
What I want:
Row | Label | FooCluster |
---|---|---|
1 | NA | NA |
2 | Foo | 1 |
3 | Foo | 1 |
4 | Foo | 1 |
5 | NA | NA |
6 | NA | NA |
7 | Foo | 2 |
8 | Foo | 2 |
9 | NA | NA |
10 | Foo | 3 |
11 | NA | NA |
... | ... | ... |
Is there something elegant out there? Thanks for any help!
Upvotes: 2
Views: 40
Reputation: 102880
Here is another option using nested cumsum
(however, the logic behind is not as simple as the answer by @akrun)
transform(
df,
FooCluster = replace(
rep(NA, length(Label)),
!is.na(Label),
cumsum(diff(c(0, cumsum(is.na(Label))[!is.na(Label)])) > 0)
)
)
which gives
Row Label FooCluster
1 1 <NA> NA
2 2 Foo 1
3 3 Foo 1
4 4 Foo 1
5 5 <NA> NA
6 6 <NA> NA
7 7 Foo 2
8 8 Foo 2
9 9 <NA> NA
10 10 Foo 3
11 11 <NA> NA
Upvotes: 0
Reputation: 887951
In base R
, this can be done with rle
df1$FooCluster <- inverse.rle(within.list(rle(is.na(df1$Label)), {
values[values] <- NA
values[!is.na(values)] <- seq_along(values[!is.na(values)])}))
-output
df1
# Row Label FooCluster
#1 1 <NA> NA
#2 2 Foo 1
#3 3 Foo 1
#4 4 Foo 1
#5 5 <NA> NA
#6 6 <NA> NA
#7 7 Foo 2
#8 8 Foo 2
#9 9 <NA> NA
#10 10 Foo 3
#11 11 <NA> NA
Or with rleid
from data.table
library(data.table)
setDT(df1)[, grp := rleid(!is.na(Label))][!is.na(Label),
FooCluster := .GRP , grp][, grp := NULL][]
df1 <- structure(list(Row = 1:11, Label = c(NA, "Foo", "Foo", "Foo",
NA, NA, "Foo", "Foo", NA, "Foo", NA)), class = "data.frame", row.names = c(NA,
-11L))
Upvotes: 1