Reputation: 21
I have a data table in R with text columns of colon delimited data. I want to return a matrix/data table of results where one of the delimited values is returned for each cell.
The code pasted below demonstrates the problem and is a working solution. However, my actual data table is large (a few thousand rows and columns), and the pasted method takes on the order of a minute or two to complete.
I'm wondering if there is a more efficient way to perform this task? It appears that the sep2
option in fread
will be very useful for this problem once implemented.
Thanks!
> # Set up data.table
> DT <- data.table(A = c("cat:1:meow", "dog:2:bark", "cow:3:moo"),
B = c("dog:3:meow", "dog:4:bark", "frog:3:croak"),
C = c("dingo:0:moo", "cat:8:croak", "frog:1:moo"))
> print(DT)
A B C
1: cat:1:meow dog:3:meow dingo:0:moo
2: dog:2:bark dog:4:bark cat:8:croak
3: cow:3:moo frog:3:croak frog:1:moo
# grab the second delimited value in each cell
> part_index <- 2
> f = function(x) {vapply(t(x), function(x) {unlist(strsplit(x, ":", fixed=T))[part_index]}, character(1))}
> sapply(DT, f)
A B C
[1,] "1" "3" "0"
[2,] "2" "4" "8"
[3,] "3" "3" "1"
Upvotes: 2
Views: 206
Reputation: 269586
1) sub Try this:
DT[, lapply(.SD, sub, pattern = ".*:(.*):.*", replacement = "\\1")]
giving:
A B C
1: 1 3 0
2: 2 4 8
3: 3 3 1
2) fread or using fread
:
DT[, lapply(.SD, function(x) fread(paste(x, collapse = "\n"))$V2)]
3) matrix Note that similar code would work with plain character matrix without data.table:
m <- as.matrix(DT)
replace(m, TRUE, sub(".*:(.*):.*", "\\1", m))
giving:
A B C
[1,] "1" "3" "0"
[2,] "2" "4" "8"
[3,] "3" "3" "1"
3a) Even simpler (no regular expressions) would be:
replace(m, TRUE, read.table(text = m, sep = ":")$V2)
3b) or using fread
from data.table:
replace(m, TRUE, fread(paste(m, collapse = "\n"))$V2)
Upvotes: 4