Reputation: 6199

How to group a sequence of values in a column in r?

Goal

I have a column called 'State' which contains different characters. One of the characters is 'Following'. I want to label the 1st sequence (consecutive values) of 'Following' as '1st', the second as '2nd' and so on. Following is the example data:

Data

> dput(foo)
structure(list(Vehicle.ID2 = c("3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588"), Frame.ID = 2110:2146, 
    State = c("Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Following", "Following", 
    "Approaching-fastveh", "Approaching-fastveh", "Approaching-fastveh", 
    "Following", "Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Approaching-fastveh", 
    "Approaching-fastveh", "Approaching-fastveh", "Approaching-fastveh", 
    "Approaching-fastveh", "Approaching-fastveh", "Approaching-fastveh"
    )), .Names = c("Vehicle.ID2", "Frame.ID", "State"), class = c("tbl_df", 
"data.frame"), row.names = c(NA, -37L))

Desired Output

structure(list(Vehicle.ID2 = c("3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588"), Frame.ID = 2110:2146, 
    State = c("Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Following", "Following", 
    "Approaching-fastveh", "Approaching-fastveh", "Approaching-fastveh", 
    "Following", "Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Approaching-fastveh", 
    "Approaching-fastveh", "Approaching-fastveh", "Approaching-fastveh", 
    "Approaching-fastveh", "Approaching-fastveh", "Approaching-fastveh"
    ), grp = c("1st", "1st", "1st", "1st", "1st", "1st", "1st", 
    "1st", "1st", ".", ".", ".", "2nd", "2nd", "2nd", "2nd", 
    "2nd", "2nd", "2nd", "2nd", "2nd", "2nd", "2nd", "2nd", "2nd", 
    "2nd", "2nd", "2nd", "2nd", "2nd", ".", ".", ".", ".", ".", 
    ".", ".")), .Names = c("Vehicle.ID2", "Frame.ID", "State", 
"grp"), row.names = c(NA, -37L), class = c("tbl_df", "data.frame"
))

What I tried

I tried using rle but it can only count the unique values. Please guide me how I can solve this problem. Please note that in original data I have multiple Vehicle.ID2 so I prefer using dplyr.

Upvotes: 1

Answers (4)

lmo

Reputation: 38520

Here is a second method using rle:

# get rle
myRle <- rle(df$State)
# get categories
counts <- cumsum(myRle$values == "Following")
# add in missings
is.na(counts) <- myRle$values != "Following"
# build variable
df$grp <- rep(counts, myRle$lengths)

Upvotes: 1

Anton

Reputation: 1538

You can do this using a loop, but it's not as elegant as the rle and dplyr solutions posted here.

temp <- rep(0,37)
label <- 1
now_following <- 0
for (i in 1:length(temp)) {
  if (foo$State[i] == "Following") {
    temp[i] <- label
    now_following <- 1
  } else {
    temp[i] <- 0
    if (now_following) {
      now_following <- 0
      label <- label + 1
    }
  }
}
df <- cbind(foo,temp)

The result:

> df
   Vehicle.ID2 Frame.ID               State temp
1     3361-588     2110           Following    1
2     3361-588     2111           Following    1
3     3361-588     2112           Following    1
4     3361-588     2113           Following    1
5     3361-588     2114           Following    1
6     3361-588     2115           Following    1
7     3361-588     2116           Following    1
8     3361-588     2117           Following    1
9     3361-588     2118           Following    1
10    3361-588     2119 Approaching-fastveh    0
11    3361-588     2120 Approaching-fastveh    0
12    3361-588     2121 Approaching-fastveh    0
13    3361-588     2122           Following    2
14    3361-588     2123           Following    2
15    3361-588     2124           Following    2
16    3361-588     2125           Following    2
17    3361-588     2126           Following    2
18    3361-588     2127           Following    2
19    3361-588     2128           Following    2
20    3361-588     2129           Following    2
21    3361-588     2130           Following    2
22    3361-588     2131           Following    2
23    3361-588     2132           Following    2
24    3361-588     2133           Following    2
25    3361-588     2134           Following    2
26    3361-588     2135           Following    2
27    3361-588     2136           Following    2
28    3361-588     2137           Following    2
29    3361-588     2138           Following    2
30    3361-588     2139           Following    2
31    3361-588     2140 Approaching-fastveh    0
32    3361-588     2141 Approaching-fastveh    0
33    3361-588     2142 Approaching-fastveh    0
34    3361-588     2143 Approaching-fastveh    0
35    3361-588     2144 Approaching-fastveh    0
36    3361-588     2145 Approaching-fastveh    0
37    3361-588     2146 Approaching-fastveh    0

If you like, you can pack it into a function:

label_contiguous_strings <- function(string_vector, string) {
  # Function returns a vector of labels for the string vector input.
  len_vector <- length(string_vector)
  temp <- rep(0,len_vector)
  label <- 1
  now_following <- 0
  for (i in 1:len_vector) {
    if (string_vector[i] == "Following") {
      temp[i] <- label
      now_following <- 1
    } else {
      temp[i] <- 0
      if (now_following) {
        now_following <- 0
        label <- label + 1
      }
    }
  }
  return(temp)
}

Result:

> label_contiguous_strings(foo$State)
 [1] 1 1 1 1 1 1 1 1 1 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0

Upvotes: 1

r2evans

Reputation: 160862

As an alternative to @akrun's answer (I'm learning about rle, thanks!), here is a dplyr-esque suggestion:

foo %>%
  mutate(
    grp = cumsum(State == "Following" &
                 State != lag(State, default="")),
    grp = ifelse(State == "Following", grp, "-")
  )
# Source: local data frame [37 x 4]
#    Vehicle.ID2 Frame.ID               State   grp
#          <chr>    <int>               <chr> <chr>
# 1     3361-588     2110           Following     1
# 2     3361-588     2111           Following     1
# 3     3361-588     2112           Following     1
# 4     3361-588     2113           Following     1
# 5     3361-588     2114           Following     1
# ..         ...      ...                 ...   ...

Upvotes: 1

akrun

Reputation: 887851

We can use rle

inverse.rle(within.list(rle(foo$State=="Following"), {
               values1 <- values
               values1[values] <- seq_along(values[values])
               values1[!values] <- '.'
               values <- values1}))

If there are different 'Vehicle.ID2', we can use ave

with(foo, ave(State == "Following", Vehicle.ID2, FUN = function(x) {
           inverse.rle(within.list(rle(x), {
                     values1 <- values
                     values1[values] <- seq_along(values[values])
                     values1[!values] <- '.'
                     values <- values1
              }))
            })) 
#[1] "1" "1" "1" "1" "1" "1" "1" "1" "1" "." "." "." "2" "2" "2" "2" "2" 
#[19] "2" "2" "2" "2" "2" "2" "2" "2" "2" "2" "2" "2" "2" "." "." "." "." "." "." "."

Upvotes: 2

How to group a sequence of values in a column in r?

Goal

Data

Desired Output

What I tried

Answers (4)

Related Questions