Reputation: 509
I have a list of words that I want to group into sentences. The data is currently in this format:
df <- data_frame(word = c("I'm", "going", "to", "be", "sick", "I", "want", "to", "go", "home"),
stop = c(FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE))
I want to sequentially label each sentence after each stop word into a new column such that the data looks like this:
df2 <- data_frame(word = c("I'm", "going", "to", "be", "sick", "I", "want", "to", "go", "home"),
stop = c(FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
num = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2))
Any quick ways to do this? Thanks!
Upvotes: 2
Views: 78
Reputation: 6532
library(tidyverse)
df %>% mutate(num = cumsum(lag(stop, default = FALSE))+1)
# A tibble: 10 x 3
word stop num
<chr> <lgl> <dbl>
1 I'm FALSE 1.
2 going FALSE 1.
3 to FALSE 1.
4 be FALSE 1.
5 sick TRUE 1.
6 I FALSE 2.
7 want FALSE 2.
8 to FALSE 2.
9 go FALSE 2.
10 home TRUE 2.
Upvotes: 4
Reputation: 43169
You could easily write a small function:
sent <- function(x) {
result <- vector(length = length(x))
n <- 1
for (i in seq_along(x)) {
result[i] <- n
if (x[i] == TRUE) {
n <- n+1
}
}
return(result)
}
df %>%
mutate(num = sent(stop))
Which yields:
# A tibble: 10 x 3
word stop num
<chr> <lgl> <dbl>
1 I'm FALSE 1.
2 going FALSE 1.
3 to FALSE 1.
4 be FALSE 1.
5 sick TRUE 1.
6 I FALSE 2.
7 want FALSE 2.
8 to FALSE 2.
9 go FALSE 2.
10 home TRUE 2.
Upvotes: 0
Reputation: 11490
This works with your Data. Not sure if this is a general solution for you:
f$num <- f$stop %>% dplyr::lag(default = 0) %>% cumsum %>% {. + 1}
> f
# # A tibble: 10 x 3
# word stop num
# <chr> <lgl> <dbl>
# 1 I'm FALSE 1.
# 2 going FALSE 1.
# 3 to FALSE 1.
# 4 be FALSE 1.
# 5 sick TRUE 1.
# 6 I FALSE 2.
# 7 want FALSE 2.
# 8 to FALSE 2.
# 9 go FALSE 2.
#10 home TRUE 2.
Upvotes: 1