statquant
statquant

Reputation: 14400

How can I reshape my data.table efficiently

I have a data.table DT

set.seed(1)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))
DT
    x y   v
 1: 1 A  29
 2: 1 B  92
 3: 1 A 100
 4: 1 B  82
 5: 2 A  28
 6: 2 B  26
 7: 2 A  18
 8: 2 B  22
 9: 3 A  30
10: 3 B  96
11: 3 A  15
12: 3 B   4

I would like to expand it like bellow, creating a new column for each value of x and reporting the v values, no structure in the data should be expected (not by blocks like bellow)

    x y v.1  v.2 v.3
 1: 1 A  29   NA  NA
 2: 1 B  92   NA  NA
 3: 1 A 100   NA  NA
 4: 1 B  82   NA  NA
 5: 2 A  NA   28  NA
 6: 2 B  NA   26  NA
 7: 2 A  NA   18  NA
 8: 2 B  NA   22  NA
 9: 3 A  NA   NA  30
10: 3 B  NA   NA  96
11: 3 A  NA   NA  15
12: 3 B  NA   NA   4

I asked a very similar question here but cannot adapt the answer G Grothendieck gave us at the time...

EDIT: As usual I just almost got it after I wrote the post... I just need to replace those 0 by NA (I might get 0 in v and I want to be able to dissociate v==0 from missing items)

DT2 <- DT[, {SUM.<-factor(x); data.table(model.matrix(~ SUM.:v + 0))}]
txtR) DT2
    SUM.1:v SUM.2:v SUM.3:v
 1:      29       0       0
 2:      92       0       0
 3:     100       0       0
 4:      82       0       0
 5:       0      28       0
 6:       0      26       0
 7:       0      18       0
 8:       0      22       0
 9:       0       0      30
10:       0       0      96
11:       0       0      15
12:       0       0       4

Upvotes: 3

Views: 160

Answers (3)

eddi
eddi

Reputation: 49448

You could simply loop over the x's and use data.table assignment:

setkey(DT, x)
for (i in unique(DT$x)) {
  DT[J(i), paste0("v.", i) := v]
}

P.S. I really wish the following worked, but .GRP is not available there:

DT[, paste0("v.", .GRP) := v, by = x]

edit one more solution (trying to get the above .GRP idea to work somehow), using rbind.fill (I didn't run very careful benches, but this seemed to scale quite well)

library(plyr)

cbind(DT,
      rbind.fill(DT[, list(list(setnames(data.table(v), paste0("v.", .GRP)))),
                      by = x]$V1))

Upvotes: 3

Frank
Frank

Reputation: 66819

set.seed(1)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))

This gives

    x y  v
 1: 1 A 27
 2: 1 B 37
 3: 1 A 57
 4: 1 B 89
 5: 2 A 20
 6: 2 B 86
 7: 2 A 97
 8: 2 B 62
 9: 3 A 58
10: 3 B  6
11: 3 A 19
12: 3 B 16

Next, the answer:

ux <- unique(DT$x)
DT[,c(v.=lapply(ux,function(i)v[x==i])),by="x,y"]

which gives

    x y v.1 v.2 v.3
 1: 1 A  27  NA  NA
 2: 1 A  57  NA  NA
 3: 1 B  37  NA  NA
 4: 1 B  89  NA  NA
 5: 2 A  NA  20  NA
 6: 2 A  NA  97  NA
 7: 2 B  NA  86  NA
 8: 2 B  NA  62  NA
 9: 3 A  NA  NA  58
10: 3 A  NA  NA  19
11: 3 B  NA  NA   6
12: 3 B  NA  NA  16

That answer might break in later versions of R, but the OP pointed out that this works too and may be faster:

DT[,paste0("v.",ux):=lapply(ux,function(i)v[x==i]),by="x"]

Upvotes: 4

Arun
Arun

Reputation: 118889

Here's one way:

tt <- model.matrix(data=DT, ~ factor(x):rep(1, nrow(DT)) + 0)
tt[tt==0] <- NA
cbind(DT, DT$v * tt)
#     x y   v factor(x)1:v factor(x)2:v factor(x)3:v
#  1: 1 A  69           69           NA           NA
#  2: 1 B  39           39           NA           NA
#  3: 1 A  76           76           NA           NA
#  4: 1 B  49           49           NA           NA
#  5: 2 A 100           NA          100           NA
#  6: 2 B  95           NA           95           NA
#  7: 2 A  36           NA           36           NA
#  8: 2 B  73           NA           73           NA
#  9: 3 A  86           NA           NA           86
# 10: 3 B  20           NA           NA           20
# 11: 3 A  59           NA           NA           59
# 12: 3 B  12           NA           NA           12

Upvotes: 3

Related Questions