Reputation: 764
I would like to split a character vector into substrings based on a second numeric vector for the splitting points
vec <- "LAYRVCMTNEGHPWVSLVVQKTRLQISQDPSLNYEYLPTMGLKSFIQASLALLFGKHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHKDARIVYIISSQKELHGLVFQDMGFTVYEYSVWDPKKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
split.points <- c(25, 32, 55, 90, 124)
I would like to cut the above character vector at the positions given in the split.points
vector into six different substrings.
It sounds very simple, but the split
command I know works either only with specific regex (patterns) or with a set length of substrings.
I would appreciate any help.
Upvotes: 3
Views: 1674
Reputation: 83215
Another alternative is to use read.fwf
:
unlist(read.fwf(textConnection(vec),
widths = c(25, diff(split.points)),
as.is = TRUE),
use.names = FALSE)
which gives:
[1] "LAYRVCMTNEGHPWVSLVVQKTRLQ" [2] "ISQDPSL" [3] "NYEYLPTMGLKSFIQASLALLFG" [4] "KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK" [5] "DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP"
I wouldn't be surprised when your character vector originates from a data-file. In that case read.fwf
would be especially usefull. An example:
vec2 <- "LAYRVCMTNEGHPWVSLVVQKTRLQISQDPSLNYEYLPTMGLKSFIQASLALLFGKHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHKDARIVYIISSQKELHGLVFQDMGFTVYEYSVWDPKKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM
LAYRVCMTNEGHPWVSLVVQKTRLQISQDPSLNYEYLPTMGLKSFIQASLALLFGKHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHKDARIVYIISSQKELHGLVFQDMGFTVYEYSVWDPKKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
read.fwf(textConnection(vec2),
widths = c(25, diff(split.points)),
as.is=TRUE)
which will give:
V1 V2 V3 V4 V5
1 LAYRVCMTNEGHPWVSLVVQKTRLQ ISQDPSL NYEYLPTMGLKSFIQASLALLFG KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP
2 LAYRVCMTNEGHPWVSLVVQKTRLQ ISQDPSL NYEYLPTMGLKSFIQASLALLFG KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP
Upvotes: 4
Reputation: 887118
We can use separate
from tidyr
library(tidyverse)
data_frame(vec) %>%
separate(vec, into = paste0('vec', 1:6), sep = split.points) %>%
unlist(., use.names = FALSE)
#[1] "LAYRVCMTNEGHPWVSLVVQKTRLQ" "ISQDPSL" "NYEYLPTMGLKSFIQASLALLFG"
#[4] "KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK" "DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP"
#[6] "KKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
A base R
option would be substr
unname(mapply(substr, vec, start = c(1, split.points+1), stop = c(split.points, nchar(vec))))
#[1] "LAYRVCMTNEGHPWVSLVVQKTRLQ" "ISQDPSL" "NYEYLPTMGLKSFIQASLALLFG"
#[4] "KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK" "DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP" "KKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
Upvotes: 3
Reputation: 17289
We can try substring
:
substring(
vec,
c(1, split.points + 1),
c(split.points, nchar(vec))
)
# [1] "LAYRVCMTNEGHPWVSLVVQKTRLQ" "ISQDPSL"
# [3] "NYEYLPTMGLKSFIQASLALLFG" "KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK"
# [5] "DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP" "KKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
Upvotes: 7