beginneR
beginneR

Reputation: 3291

Reshaping data using the data.table package

I have asked the same question a few days ago ( click here), but didn't mention that a result using data.table would be appreciated

The "aggregate-solution" works fine, even though it is pretty slow! I am searching for a faster way to solve this.

I want to reshape the following data.frame:

df <- data.frame(x=c("p1","p1","p2"),y=c("a","b","a"),z=c(14,14,16))
df
   x y  z
1 p1 a 14
2 p1 b 14
3 p2 a 16

so that it looks like this one:

df2 <- data.frame(x=c("p1","p2"),a=c(1,1),b=c(1,0),z=c(14,16))
   x a b  z
1 p1 1 1 14
2 p2 1 0 16

The variable y in df should be broken so that its elements are new variables, each dummy coded. All other variables (in this case just z) are equal for each person (p1,p2 etc.). The only variable where a specific person p has different values is y.
The reason I want this is because I need to merge this dataset with other ones by variable x. Thing is, it needs to be one row per person (p1,p2 etc).

Upvotes: 3

Views: 608

Answers (1)

eddi
eddi

Reputation: 49448

Going to wide format is a little awkward currently in data.table, but I think this works:

library(data.table)
dt = data.table(x=c("p1","p1","p2"),y=c("a","b","a"),z=c(14,14,16))

setkey(dt, x, y)
dt[CJ(unique(x), unique(y)), list(.N, z)][,
   setNames(as.list(c(N, z[!is.na(z)][1])), c(y, 'z')), by = x]
#    x a b  z
#1: p1 1 1 14
#2: p2 1 0 16

The CJ part joins by all combinations of unique x and y, and then in that join there is a hidden by-without-by that's used to compute counts via .N. Once you have those it's just a matter of putting them horizontally for each x together with any non-NA z (I chose the first) and that's accomplished using as.list. Finally setNames sets the column names correctly.

Upvotes: 5

Related Questions