Reputation: 803
I'm trying to write a function that builds a matrix by splitting a character vector repeatedly using successive elements in a vector of patterns.
Let's call the function I'm trying to write str_split_vector()
. Here's an example of the output I'm looking for:
char <- c("A & P | B & C @ D",
"E & Q | F & G @ H",
"I & R | J & K @ L")
splits <- c(" \\| ", " & ", " @ ")
str_split_vector(char, splits)
# [,1] [,2] [,3] [,4]
# [1,] "A & P" "B" "C" "D"
# [2,] "E & Q" "F" "G" "H"
# [3,] "I & R" "J" "K" "L"
The char
vector is split by each pattern in turn, leaving "A & P"
intact. (Although it might be easiest to manage that last bit with particular regex patterns.)
I've been able to accomplish this task only iteratively, with a pretty ad hoc loop:
for(ii in 1:length(splits)) {
if(ii == 1) {
char_mat <- matrix(char)
char_mat <- do.call(rbind, strsplit(char_mat[ , ii], splits[ii]))
} else {
char_mat <- cbind(char_mat[ , 1:ii - 1],
do.call(rbind,
strsplit(char_mat[ , ii], splits[ii])
)
)
}
}
That process looks inefficient to me, since I'm "growing" char_mat
with the repeated cbind()
calls. Even worse, I find it almost impossible to understand what's going on without actually running the code.
Is there a simpler way to write this, potentially ignoring the requirement that "A & P"
not be split?
Upvotes: 1
Views: 757
Reputation: 76402
Maybe the following is what you want. No loops.
str_split_vector <- function(x, y){
s <- strsplit(x, paste(y, collapse = "|"))
do.call(rbind, s)
}
str_split_vector(char, splits)
# [,1] [,2] [,3] [,4] [,5]
#[1,] "A" "P" "B" "C" "D"
#[2,] "E" "Q" "F" "G" "H"
#[3,] "I" "R" "J" "K" "L"
An approach that uses grouping and won't perform any splitting on the first &
is the following:
do.call(rbind, strsplit(gsub("(.*) \\| (.*) & (.*) @ (.*)", "\\1_\\2_\\3_\\4", char), "_"))
It basically replaces the characters you wish to split on with an underscore and then splits on those underscores.
Upvotes: 3