Reputation: 23898
Wonder how to split column of a data.table.
Can split column of a data.frame
easily with the following code:
df <- data.frame(Test=c("A - B", "C - D"))
df
Test
1 A - B
2 C - D
library(reshape2)
reshape2:::colsplit(string = df[,1], pattern = " ", names = c("Var1", "Space", "Var2"))
Var1 Space Var2
1 A - B
2 C - D
But my attempt of splitting column of a data.table
fails
library(data.table)
dt <- data.table(Test=c("A - B", "C - D"))
dt
Test
1: A - B
2: C - D
reshape2:::colsplit(string = dt[,1, with=FALSE], pattern = " ", names = c("Var1", "Space", "Var2"))
Error: String must be an atomic vector
Upvotes: 0
Views: 1266
Reputation: 193517
I see that you're specifically asking for something with colsplit
, but I would suggest looking at some of the other alternatives out there, for example my cSplit
function.
The cSplit
approach would be as follows:
setnames(cSplit(dt, "Test", " "), c("Var1", "Space", "Var2"))[]
# Var1 Space Var2
# 1: A - B
# 2: C - D
The []
at the end is to print the result, but you can also just store the result as a new data.table
instead.
How does it compare in terms of efficiency?
fun1 <- function() {
reshape2:::colsplit(string = dt[[1]], pattern = " ",
names = c("Var1", "Space", "Var2"))
}
fun2 <- function() {
setnames(cSplit(dt, "Test", " "),
c("Var1", "Space", "Var2"))[]
}
dt <- rbindlist(replicate(5000, dt, FALSE))
dim(dt)
# [1] 10000 1
library(microbenchmark)
microbenchmark(fun1(), fun2(), times = 10)
# Unit: milliseconds
# expr min lq median uq max neval
# fun1() 2025.84703 2093.39687 2195.75822 2390.30666 2492.65946 10
# fun2() 34.08966 36.01145 43.28036 47.45962 57.57615 10
Why didn't your colsplit
approach work as you expected?
dt[,1]
is more like df[,1, drop = FALSE]
(try it--you'll get the same error that you got with your "data.table" attempt).
You will need either of the following:
> dt[[1]]
[1] "A - B" "C - D"
> dt$Test
[1] "A - B" "C - D"
Which is similar to what you get with:
> df[, 1]
[1] A - B C - D
Levels: A - B C - D
Upvotes: 1