user3245256
user3245256

Reputation: 1948

R: creating combinations of elements of a vector from left to right

I have a vector of strings 'x' that was based on a longer string 'mystring' (the actual length of x is unknown).

mystring <- "this is my vector"
x <- strsplit(mystring, " ")[[1]]

I am looking for an elegant way of creating an object (e.g., a list) that contains the following strings:

string1 
string1 + string2 
string1 + string2 + string3
string1 + string2 + string3 + string 4
string2 
string2 + string3
 etc.:

"this"
"this is"
"this is my"
"this is my vector"
"is"
"is my"
"is my vector"
"my"
"my vector"
"vector"

Thanks a lot!

Upvotes: 0

Views: 64

Answers (1)

Michael Griffiths
Michael Griffiths

Reputation: 1427

It sounds like you want to construct ngrams! There are plenty of ways to do this; you might consider the tokenizers library.

For example, let's say you want n-grams of 1 through 4.

library(tidyverse)
library(tokenizers)
mystring <- "this is my vector"
map(1:4, ~tokenize_ngrams(mystring, lowercase = FALSE, n = .x)) %>% 
  unlist
#>  [1] "this"              "is"                "my"               
#>  [4] "vector"            "this is"           "is my"            
#>  [7] "my vector"         "this is my"        "is my vector"     
#> [10] "this is my vector"

Upvotes: 2

Related Questions