Reputation: 509

Sequentially label sentences in R?

I have a list of words that I want to group into sentences. The data is currently in this format:

df <- data_frame(word = c("I'm", "going", "to", "be", "sick", "I", "want", "to", "go", "home"),
             stop = c(FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE))

I want to sequentially label each sentence after each stop word into a new column such that the data looks like this:

df2 <- data_frame(word = c("I'm", "going", "to", "be", "sick", "I", "want", "to", "go", "home"),
             stop = c(FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
             num = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2))

Any quick ways to do this? Thanks!

Upvotes: 2

Answers (3)

Stephen Henderson

Reputation: 6532

library(tidyverse)
   df %>% mutate(num = cumsum(lag(stop, default = FALSE))+1)
    # A tibble: 10 x 3
       word  stop    num
       <chr> <lgl> <dbl>
     1 I'm   FALSE    1.
     2 going FALSE    1.
     3 to    FALSE    1.
     4 be    FALSE    1.
     5 sick  TRUE     1.
     6 I     FALSE    2.
     7 want  FALSE    2.
     8 to    FALSE    2.
     9 go    FALSE    2.
    10 home  TRUE     2.

Upvotes: 4

Jan

Reputation: 43169

You could easily write a small function:

sent <- function(x) {
  result <- vector(length = length(x))
  n <- 1
  for (i in seq_along(x)) {
    result[i] <- n
    if (x[i] == TRUE) {
      n <- n+1
    }
  }
  return(result)
}

df %>%
  mutate(num = sent(stop))

Which yields:

# A tibble: 10 x 3
   word  stop    num
   <chr> <lgl> <dbl>
 1 I'm   FALSE    1.
 2 going FALSE    1.
 3 to    FALSE    1.
 4 be    FALSE    1.
 5 sick  TRUE     1.
 6 I     FALSE    2.
 7 want  FALSE    2.
 8 to    FALSE    2.
 9 go    FALSE    2.
10 home  TRUE     2.

Upvotes: 0

Andre Elrico

Reputation: 11490

This works with your Data. Not sure if this is a general solution for you:

f$num <- f$stop %>% dplyr::lag(default = 0) %>% cumsum %>% {. + 1}

> f
# # A tibble: 10 x 3
# word  stop    num
#   <chr> <lgl> <dbl>
# 1 I'm   FALSE    1.
# 2 going FALSE    1.
# 3 to    FALSE    1.
# 4 be    FALSE    1.
# 5 sick  TRUE     1.
# 6 I     FALSE    2.
# 7 want  FALSE    2.
# 8 to    FALSE    2.
# 9 go    FALSE    2.
#10 home  TRUE     2.

Upvotes: 1

Sequentially label sentences in R?

Answers (3)

Related Questions