Rcoding
Rcoding

Reputation: 357

Make all possible string combinations out of input sentences

In order to create all possible string combinations from input sentences, I made the code lines below.

library(stringr)
text = c('I like you', 'I love you so much', 'she like it so much', 'she hate you', 'he hate you so much','I like him')
tex = data.frame(text)

library(splitstackshape)
pattern = data.frame(cSplit(tex, "text", " "))

n=ncol(pattern)

dat = c()
for(i in 1:n){
  tt = unique(pattern[,i])
  g=paste0(tt,collapse = ' ')
  dat = c(dat,g)
  SEQ = data.frame(dat)
}

SEQ = data.frame(cSplit(SEQ, "dat", " "))

It can form this data frame.

  dat_1 dat_2 dat_3
1     I   she    he
2  like  love  hate
3   you    it   him
4  <NA>    so  <NA>
5  <NA>  much  <NA>

What I want is to create all possible combinations (108) of the words like below.

I like you so NA 
I like you so much 
I like you NA NA 
I like you NA much 
...
he love him so much 
he love him NA NA 
he love him NA much 
he hate you so NA 
he hate you so much 
...

What should I do to make these lists?

Upvotes: 1

Views: 84

Answers (2)

Henrik
Henrik

Reputation: 67778

I think data.table::tstrsplit is convenient for splitting and transposing. Then, select unique values of each list element (lapply(x, unique)), and make all combinations (expand.grid)

expand.grid(lapply(data.table::tstrsplit(text, split = " "), unique))

 #       Var1 Var2 Var3 Var4 Var5
 #   1      I like  you <NA> <NA>
 #   2    she like  you <NA> <NA>
 #   3     he like  you <NA> <NA>
 #   4      I love  you <NA> <NA>
 #   5    she love  you <NA> <NA>
 #   [snip]
 #   104  she love  him   so much
 #   105   he love  him   so much
 #   106    I hate  him   so much
 #   107  she hate  him   so much
 #   108   he hate  him   so much

You may also use the data.table equivalent of expand.grid, CJ, which has a unique argument.

library(data.table)
do.call(CJ, c(tstrsplit(text, split = " "), unique = TRUE))

#       V1   V2  V3   V4   V5
#   1:   I hate him <NA> <NA>
#   2:   I hate him <NA> much
#   3:   I hate him   so <NA>
#   4:   I hate him   so much
#   5:   I hate  it <NA> <NA>
# ---                       
# 104: she love  it   so much
# 105: she love you <NA> <NA>
# 106: she love you <NA> much
# 107: she love you   so <NA>
# 108: she love you   so much

Upvotes: 2

akrun
akrun

Reputation: 887098

From the "pattern" dataset, we can also use expand from tidyr

library(tidyr)
expand(pattern, !!! rlang::syms(names(pattern)))

Or we can use separate with expand

library(tidyverse)
mx <- max(str_count(tex$text, "\\w+"))

tex %>% 
  separate(text, into = paste0("dat_", seq_len(mx))) %>%
  expand(!!! rlang::syms(names(.)))

Upvotes: 2

Related Questions