Kalaschnik
Kalaschnik

Reputation: 908

Cluster groups having the same label and are next to each other

I’m unable to find a good solution, but I think there might be even a base or tidyverse R function that could do that:

My Data:

Row Label
1 NA
2 Foo
3 Foo
4 Foo
5 NA
6 NA
7 Foo
8 Foo
9 NA
10 Foo
11 NA
... ...

What I want:

Row Label FooCluster
1 NA NA
2 Foo 1
3 Foo 1
4 Foo 1
5 NA NA
6 NA NA
7 Foo 2
8 Foo 2
9 NA NA
10 Foo 3
11 NA NA
... ... ...

Is there something elegant out there? Thanks for any help!

Upvotes: 2

Views: 40

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 102880

Here is another option using nested cumsum (however, the logic behind is not as simple as the answer by @akrun)

transform(
  df,
  FooCluster = replace(
    rep(NA, length(Label)),
    !is.na(Label),
    cumsum(diff(c(0, cumsum(is.na(Label))[!is.na(Label)])) > 0)
  )
)

which gives

   Row Label FooCluster
1    1  <NA>         NA
2    2   Foo          1
3    3   Foo          1
4    4   Foo          1
5    5  <NA>         NA
6    6  <NA>         NA
7    7   Foo          2
8    8   Foo          2
9    9  <NA>         NA
10  10   Foo          3
11  11  <NA>         NA

Upvotes: 0

akrun
akrun

Reputation: 887951

In base R, this can be done with rle

df1$FooCluster <- inverse.rle(within.list(rle(is.na(df1$Label)), {
         values[values] <- NA
         values[!is.na(values)] <- seq_along(values[!is.na(values)])}))

-output

df1
#   Row Label FooCluster
#1    1  <NA>         NA
#2    2   Foo          1
#3    3   Foo          1
#4    4   Foo          1
#5    5  <NA>         NA
#6    6  <NA>         NA
#7    7   Foo          2
#8    8   Foo          2
#9    9  <NA>         NA
#10  10   Foo          3
#11  11  <NA>         NA

Or with rleid from data.table

library(data.table)
setDT(df1)[, grp := rleid(!is.na(Label))][!is.na(Label), 
      FooCluster := .GRP , grp][, grp := NULL][]

data

df1 <- structure(list(Row = 1:11, Label = c(NA, "Foo", "Foo", "Foo", 
NA, NA, "Foo", "Foo", NA, "Foo", NA)), class = "data.frame", row.names = c(NA, 
-11L))

Upvotes: 1

Related Questions