Reputation: 8626
[UPDATE: there is now a native transpose()
function in data.table
package]
I often need to transpose a data.table
, every time it takes several lines of code and I am wondering if there's any better solution than mine.
if we take sample table
library(data.table)
mydata <- data.table(col0=c("row1","row2","row3"),
col1=c(11,21,31),
col2=c(12,22,32),
col3=c(13,23,33))
mydata
# col0 col1 col2 col3
# row1 11 12 13
# row2 21 22 23
# row3 31 32 33
and just transpose it with t()
, it will be transposed to the matrix with conversion to character
type, while applying data.table
to such matrix will lose row.names
:
t(mydata)
# [,1] [,2] [,3]
# col0 "row1" "row2" "row3"
# col1 "11" "21" "31"
# col2 "12" "22" "32"
# col3 "13" "23" "33"
data.table(t(mydata))
# V1 V2 V3
# row1 row2 row3
# 11 21 31
# 12 22 32
# 13 23 33
so I had to write a function for this:
tdt <- function(inpdt){
transposed <- t(inpdt[,-1,with=F]);
colnames(transposed) <- inpdt[[1]];
transposed <- data.table(transposed, keep.rownames=T);
setnames(transposed, 1, names(inpdt)[1]);
return(transposed);
}
tdt(mydata)
# col0 row1 row2 row3
# col1 11 21 31
# col2 12 22 32
# col3 13 23 33
is there anything I could optimize here or do it in "nicer" way?
Upvotes: 61
Views: 108598
Reputation: 10383
The current docs show a builtin transpose
method.
Specifically, you can do:
transpose(mydata, keep.names = "col", make.names = "col0")
## col row1 row2 row3
## 1: col1 11 21 31
## 2: col2 12 22 32
## 3: col3 13 23 33
Upvotes: 44
Reputation: 81
df <- as.data.frame(t(mydata))
is what I tried and df
is a data.frame
and the column names on mydata
are now row names on df
Upvotes: 8
Reputation: 193527
Why not just melt
and dcast
the data.table
?
require(data.table)
dcast(melt(mydata, id.vars = "col0"), variable ~ col0)
# variable row1 row2 row3
# 1: col1 11 21 31
# 2: col2 12 22 32
# 3: col3 13 23 33
Upvotes: 62
Reputation: 1
The tdt function which I provide below should be faster
tdt <- function(DT, transpose.col, ...) {
# The transpose function is efficient, but lacks the keeping of row and colnames
new.row.names <- colnames(DT)
new.row.names <- new.row.names[!new.row.names %in% transpose.col]
new.col.names <- DT[, transpose.col, with = F]
DT <- DT[, !colnames(DT) %in% transpose.col, with = F]
DT <- transpose(DT, ...)
colnames(DT) <- unlist(new.col.names)
DT$var <- new.row.names
# change order of DT after transposing
setcolorder(DT, c("var", setdiff(names(DT), "var")))
colnames(DT)[1] <- transpose.col
return(DT)
}
library(microbenchmark); library(microbenchmarkCore)
DT <- data.table(x=1:1000, y=paste("name", 1:1000, sep = "_"), z = paste("test", 1:1000, sep = "."))
rbind(microbenchmark(tdt(DT, "y")),
microbenchmark(dcast(melt(DT, id.vars = "y"), variable ~ y)),
microbenchmark(DT[, data.table(t(.SD), keep.rownames=TRUE), .SDcols=-"y"]))
Unit: milliseconds
expr min lq mean median uq max neval cld
tdt(DT, "y") 3.463842 3.719341 4.308158 3.911599 4.576477 20.406940 100 a
dcast(melt(DT, id.vars = "y"), variable ~ y) 5.146119 5.496761 5.826647 5.580796 5.870584 9.536541 100 a
DT[, data.table(t(.SD), keep.rownames = TRUE), .SDcols = -"y"] 29.975567 34.554989 40.807036 36.724430 39.102396 104.242218 100 b
d <- tdt(DT, "y")
d[1:2, 1:11]
y name_1 name_2 name_3 name_4 name_5 name_6 name_7 name_8 name_9 name_10
1: x 1 2 3 4 5 6 7 8 9 10
2: z test.1 test.2 test.3 test.4 test.5 test.6 test.7 test.8 test.9 test.10
DT[1:10, 1:3]
x y z
1: 1 name_1 test.1
2: 2 name_2 test.2
3: 3 name_3 test.3
4: 4 name_4 test.4
5: 5 name_5 test.5
6: 6 name_6 test.6
7: 7 name_7 test.7
8: 8 name_8 test.8
9: 9 name_9 test.9
10: 10 name_10 test.10
class(d)
[1] "data.table" "data.frame"
Upvotes: -1
Reputation: 1095
Here's a solution that uses a wrapper to tidy up the output of the data.table transpose
function.
With really large data sets this seems to be more efficient than the dcast/melt approach (I tested it on a 8000 row x 29000 column data set, the below function works in about 3 minutes but dcast/melt crashed R):
# Function to clean up output of data.table transpose:
transposedt <- function(dt, varlabel) {
require(data.table)
dtrows = names(dt)
dtcols = as.list(c(dt[,1]))
dtt = transpose(dt)
dtt[, eval(varlabel) := dtrows]
setnames(dtt, old = names(dtt), new = c(dtcols[[1]], eval(varlabel)))
dtt = dtt[-1,]
setcolorder(dtt, c(eval(varlabel), names(dtt)[1:(ncol(dtt) - 1)]))
return(dtt)
}
# Some dummy data
mydt <- data.table(col0 = c(paste0("row", seq_along(1:100))),
col01 = c(sample(seq_along(1:100), 100)),
col02 = c(sample(seq_along(1:100), 100)),
col03 = c(sample(seq_along(1:100), 100)),
col04 = c(sample(seq_along(1:100), 100)),
col05 = c(sample(seq_along(1:100), 100)),
col06 = c(sample(seq_along(1:100), 100)),
col07 = c(sample(seq_along(1:100), 100)),
col08 = c(sample(seq_along(1:100), 100)),
col09 = c(sample(seq_along(1:100), 100)),
col10 = c(sample(seq_along(1:100), 100)))
# Apply the function:
mydtt <- transposedt(mydt, "myvariables")
# View the results:
> mydtt[,1:10]
myvariables row1 row2 row3 row4 row5 row6 row7 row8 row9
1: col01 58 53 14 96 51 30 26 15 68
2: col02 6 72 46 62 69 9 63 32 78
3: col03 21 36 94 41 54 74 82 64 15
4: col04 68 41 66 30 31 78 51 67 26
5: col05 49 30 52 78 73 71 5 66 44
6: col06 89 35 79 67 6 88 62 97 73
7: col07 66 15 27 29 58 40 35 82 57
8: col08 55 47 83 30 23 65 48 56 87
9: col09 41 10 21 33 55 81 94 25 34
10: col10 35 17 41 44 21 66 69 61 46
What is also useful is that columns (ex rows) occur in their original order and you can name the variables column something meaningful.
Upvotes: 2
Reputation: 22293
Here's an alternative solution that only uses data.table
and that is closer to the original idea of using t
to transpose.
mydata[, data.table(t(.SD), keep.rownames=TRUE), .SDcols=-"col0"]
## rn V1 V2 V3
## 1: col1 11 21 31
## 2: col2 12 22 32
## 3: col3 13 23 33
If keeping the rownames is important, setnames
can be used. Admittedly this becomes a bit clumsy and probably the recast solution is preferable.
setnames(mydata[, data.table(t(.SD), keep.rownames=TRUE), .SDcols=-"col0"],
mydata[, c('rn', col0)])[]
## rn row1 row2 row3
## 1: col1 11 21 31
## 2: col2 12 22 32
## 3: col3 13 23 33
Upvotes: 21