Reputation: 21
for example, I know the default setting for collapse in unnest_tokens is TRUE. But I'm confused what's real meaning of collapse argument. I have read through the R documentation, however I still confused. Here's an example I wrote. Is there any difference for the return value if I change collapse to TRUE?
bigram_freq <- tw %>%
unnest_tokens(bigram,text,token = "ngrams", n=2, collapse = FALSE)
Upvotes: 0
Views: 295
Reputation: 11663
The collapse
argument controls how the input text is handled across new lines:
Whether to combine text with newlines first in case tokens (such as sentences or paragraphs) span multiple lines.
Check out the difference in behavior with collapse = TRUE
compared to collapse = FALSE
:
library(tidyverse)
library(tidytext)
emily <- tibble(text = c("Because I could not stop for Death -",
"He kindly stopped for me -"))
## notice the bigram "death he"
emily %>%
unnest_tokens(word, text, token = "ngrams", n = 2, collapse = TRUE)
#> # A tibble: 11 x 1
#> word
#> <chr>
#> 1 because i
#> 2 i could
#> 3 could not
#> 4 not stop
#> 5 stop for
#> 6 for death
#> 7 death he
#> 8 he kindly
#> 9 kindly stopped
#> 10 stopped for
#> 11 for me
## notice no "death he"
emily %>%
unnest_tokens(word, text, token = "ngrams", n = 2, collapse = FALSE)
#> # A tibble: 10 x 1
#> word
#> <chr>
#> 1 because i
#> 2 i could
#> 3 could not
#> 4 not stop
#> 5 stop for
#> 6 for death
#> 7 he kindly
#> 8 kindly stopped
#> 9 stopped for
#> 10 for me
Created on 2020-08-18 by the reprex package (v0.3.0.9001)
Upvotes: 0