Reputation: 79
I have pairs of words that are transcribed in ARPABET. I am trying to combine these words such that every possible segment sequence, assuming strict ordering, is produced. An example would look like:
word1 transcription1 word2 transcription2
dog D AA G cat K AE T
combining transcription1 and transcription2 would result in something like below where it iterates by segment. For the purposes of this toy example, I've not included instances of no segment from the second word being combined (i.e., dog+cat = dog), but it's probably in the logical space.
D K AE T
D AE T
D T
D AA K AE T
D AA AE T
D AA T
D AA G K AE T
D AA G AE T
D AA G T
D AA G
K D AA G
K AA G
K G
K AE D AA G
K AE AA G
K AE G
K AE T D AA G
K AE T AA G
K AE T G
The eventual goal is to do some quantitative analysis on each of these outputs, so saving them to a large data frame would be ideal, although it might become unwieldy with the amount of data I am working with (~900 pairs of words, 3-7 segments each). Any help on this problem would be great.
Upvotes: 2
Views: 102
Reputation: 35554
My handmade function which only uses base functions.
fun <- function(x, y){
x <- strsplit(x, " ")[[1]]
y <- strsplit(y, " ")[[1]]
apply(do.call(expand.grid, lapply(c(x, y), c, NA)),
1, function(x) paste(x[!is.na(x)], collapse = " "))
}
fun("D AA G", "K AE T")
# [1] "D AA G K AE T" "AA G K AE T" "D G K AE T" "G K AE T"
# [5] "D AA K AE T" "AA K AE T" "D K AE T" "K AE T"
# [9] "D AA G AE T" "AA G AE T" "D G AE T" "G AE T"
# [13] "D AA AE T" "AA AE T" "D AE T" "AE T"
# [17] "D AA G K T" "AA G K T" "D G K T" "G K T"
# [21] "D AA K T" "AA K T" "D K T" "K T"
# [25] "D AA G T" "AA G T" "D G T" "G T"
# [29] "D AA T" "AA T" "D T" "T"
# [33] "D AA G K AE" "AA G K AE" "D G K AE" "G K AE"
# [37] "D AA K AE" "AA K AE" "D K AE" "K AE"
# [41] "D AA G AE" "AA G AE" "D G AE" "G AE"
# [45] "D AA AE" "AA AE" "D AE" "AE"
# [49] "D AA G K" "AA G K" "D G K" "G K"
# [53] "D AA K" "AA K" "D K" "K"
# [57] "D AA G" "AA G" "D G" "G"
# [61] "D AA" "AA" "D" ""
Upvotes: 2
Reputation: 9705
Here's a simple function to do so:
library(dplyr)
segment_sequences <- function(x, y) {
x <- strsplit(x, " ") %>% unlist
y <- strsplit(y, " ") %>% unlist
z <- c(x,y)
sapply(seq_along(z), function(j) {
combos <- combn(seq_along(z), j, simplify = FALSE)
sapply(combos, function(cb) paste0(z[cb], collapse=" "))
}) %>% do.call(c,.)
}
segment_sequences("D AA G","K AE T")
[1] "D" "AA" "G" "K" "AE" "T" "D AA" "D G" "D K" "D AE" "D T" "AA G" "AA K" "AA AE" "AA T" "G K" "G AE"
[18] "G T" "K AE" "K T" "AE T" "D AA G" "D AA K" "D AA AE" "D AA T" "D G K" "D G AE" "D G T" "D K AE" "D K T" "D AE T" "AA G K" "AA G AE" "AA G T"
[35] "AA K AE" "AA K T" "AA AE T" "G K AE" "G K T" "G AE T" "K AE T" "D AA G K" "D AA G AE" "D AA G T" "D AA K AE" "D AA K T" "D AA AE T" "D G K AE" "D G K T" "D G AE T" "D K AE T"
[52] "AA G K AE" "AA G K T" "AA G AE T" "AA K AE T" "G K AE T" "D AA G K AE" "D AA G K T" "D AA G AE T" "D AA K AE T" "D G K AE T" "AA G K AE T" "D AA G K AE T"
Upvotes: 3