Reputation: 1840
I know there are some answers here about splitting a string every nth
character, such as this one and this one, However these are pretty question specific and mostly related to a single string and not to a data frame of multiple strings.
Example data
df <- data.frame(id = 1:2, seq = c('ABCDEFGHI', 'ZABCDJHIA'))
Looks like this:
id seq
1 1 ABCDEFGHI
2 2 ZABCDJHIA
Splitting on every third character
I want to split the string in each row every thrid character, such that the resulting data frame looks like this:
id 1 2 3
1 ABC DEF GHI
2 ZAB CDJ HIA
What I tried
I used the splitstackshape
before to split a string on a single character, like so: df %>% cSplit('seq', sep = '', stripWhite = FALSE, type.convert = FALSE)
I would love to have a similar function (or perhaps it is possbile with cSplit) to split on every third character.
Upvotes: 4
Views: 1523
Reputation: 39727
You can split a string each x characters in base also with read.fwf (Read Fixed Width Format Files), which needs either a file or a connection.
read.fwf(file=textConnection(as.character(df$seq)), widths=c(3,3,3))
V1 V2 V3
1 ABC DEF GHI
2 ZAB CDJ HIA
Upvotes: 1
Reputation: 887951
An option would be separate
library(tidyverse)
df %>%
separate(seq, into = paste0("x", 1:3), sep = c(3, 6))
# id x1 x2 x3
#1 1 ABC DEF GHI
#2 2 ZAB CDJ HIA
If we want to create it more generic
n1 <- nchar(as.character(df$seq[1])) - 3
s1 <- seq(3, n1, by = 3)
nm1 <- paste0("x", seq_len(length(s1) +1))
df %>%
separate(seq, into = nm1, sep = s1)
Or using base R
, using strsplit
, split the 'seq' column for each instance of 3 characters by passing a regex lookaround into a list
and then rbind
the list
elements
df[paste0("x", 1:3)] <- do.call(rbind,
strsplit(as.character(df$seq), "(?<=.{3})", perl = TRUE))
NOTE: It is better to avoid column names that start with non-standard labels such as numbers. For that reason, appended 'x' at the beginning of the names
Upvotes: 4