Reputation: 1277
I'm looking for an elegant way to split and convert semicolon-separated values from two columns into rows. Data looks like as:
1 A0A024R442;E7ETB3;Q9ULA0;E7EMB6 443;465;447;382
2 A0A024R442;E7ETB3;Q9ULA0;E7EMB6;C9JBE1 323;345;327;262;309
The required output is:
A0A024R442 443
E7ETB3 465
Q9ULA0 447
...................
C9JBE1 309
The number of elements in columns is always equal per row as depicted above. I have tried the strsplit() function but I think I can't split delimited values on multiple columns simultaneously (please correct me if I'm wrong).
I would appreciate any suggestion.
Upvotes: 0
Views: 89
Reputation: 26343
Here is a data.table
option
library(data.table)
DT <- fread(text, header = FALSE)
out <- DT[, lapply(.SD, tstrsplit, split = ";", fixed = TRUE), by = 1:nrow(DT)]
out[, nrow := NULL][]
# V1 V2
#1: A0A024R442 443
#2: E7ETB3 465
#3: Q9ULA0 447
#4: E7EMB6 382
#5: A0A024R442 323
#6: E7ETB3 345
#7: Q9ULA0 327
#8: E7EMB6 262
#9: C9JBE1 309
Or in baseR
using strsplit
instead
DF <- read.table(text = text, stringsAsFactors = FALSE, header = FALSE)
as.data.frame(lapply(DF, function(x) unlist(strsplit(x, split = ";", fixed = TRUE))))
data
text = "A0A024R442;E7ETB3;Q9ULA0;E7EMB6 443;465;447;382
A0A024R442;E7ETB3;Q9ULA0;E7EMB6;C9JBE1 323;345;327;262;309"
Upvotes: 2
Reputation: 887088
The separate_rows
can be used as well
library(tidyverse)
separate_rows(df1, c("V1", "V2"))
# V1 V2
#1 A0A024R442 443
#2 E7ETB3 465
#3 Q9ULA0 447
#4 E7EMB6 382
#5 A0A024R442 323
#6 E7ETB3 345
#7 Q9ULA0 327
#8 E7EMB6 262
#9 C9JBE1 309
Or using cSplit
library(splitstackshape)
cSplit(df1, c("V1", "V2"), ";", "long")
df1 <- structure(list(V1 = c("A0A024R442;E7ETB3;Q9ULA0;E7EMB6",
"A0A024R442;E7ETB3;Q9ULA0;E7EMB6;C9JBE1"
), V2 = c("443;465;447;382", "323;345;327;262;309")),
row.names = c(NA,
-2L), class = "data.frame")
Upvotes: 1
Reputation: 79208
base R:
data.frame(matrix(unlist(strsplit(as.matrix(df),";")),ncol=2))
X1 X2
1 A0A024R442 443
2 E7ETB3 465
3 Q9ULA0 447
4 E7EMB6 382
5 A0A024R442 323
6 E7ETB3 345
7 Q9ULA0 327
8 E7EMB6 262
9 C9JBE1 309
Upvotes: 2