Reputation: 1948
I have a vector of strings 'x' that was based on a longer string 'mystring' (the actual length of x is unknown).
mystring <- "this is my vector"
x <- strsplit(mystring, " ")[[1]]
I am looking for an elegant way of creating an object (e.g., a list) that contains the following strings:
string1
string1 + string2
string1 + string2 + string3
string1 + string2 + string3 + string 4
string2
string2 + string3
etc.:
"this"
"this is"
"this is my"
"this is my vector"
"is"
"is my"
"is my vector"
"my"
"my vector"
"vector"
Thanks a lot!
Upvotes: 0
Views: 64
Reputation: 1427
It sounds like you want to construct ngrams! There are plenty of ways to do this; you might consider the tokenizers library.
For example, let's say you want n-grams of 1 through 4.
library(tidyverse)
library(tokenizers)
mystring <- "this is my vector"
map(1:4, ~tokenize_ngrams(mystring, lowercase = FALSE, n = .x)) %>%
unlist
#> [1] "this" "is" "my"
#> [4] "vector" "this is" "is my"
#> [7] "my vector" "this is my" "is my vector"
#> [10] "this is my vector"
Upvotes: 2