Reputation: 3291
I have asked the same question a few days ago ( click here), but didn't mention that a result using data.table
would be appreciated
The "aggregate-solution" works fine, even though it is pretty slow! I am searching for a faster way to solve this.
I want to reshape the following data.frame:
df <- data.frame(x=c("p1","p1","p2"),y=c("a","b","a"),z=c(14,14,16))
df
x y z
1 p1 a 14
2 p1 b 14
3 p2 a 16
so that it looks like this one:
df2 <- data.frame(x=c("p1","p2"),a=c(1,1),b=c(1,0),z=c(14,16))
x a b z
1 p1 1 1 14
2 p2 1 0 16
The variable y
in df
should be broken so that its elements are new variables, each dummy coded. All other variables (in this case just z
) are equal for each person (p1,p2 etc.). The only variable where a specific person p has different values is y
.
The reason I want this is because I need to merge this dataset with other ones by variable x
. Thing is, it needs to be one row per person (p1
,p2
etc).
Upvotes: 3
Views: 608
Reputation: 49448
Going to wide format is a little awkward currently in data.table
, but I think this works:
library(data.table)
dt = data.table(x=c("p1","p1","p2"),y=c("a","b","a"),z=c(14,14,16))
setkey(dt, x, y)
dt[CJ(unique(x), unique(y)), list(.N, z)][,
setNames(as.list(c(N, z[!is.na(z)][1])), c(y, 'z')), by = x]
# x a b z
#1: p1 1 1 14
#2: p2 1 0 16
The CJ
part joins by all combinations of unique x
and y
, and then in that join there is a hidden by-without-by that's used to compute counts via .N
. Once you have those it's just a matter of putting them horizontally for each x
together with any non-NA z
(I chose the first) and that's accomplished using as.list
. Finally setNames
sets the column names correctly.
Upvotes: 5