Reputation: 606
I have been trying to use dcast(). I have this example:
class = c(rep("A1", 3), rep("B2", 5), rep("C3", 2), rep("D4", 4))
myvar = rnorm(14)
mydf = data.frame(class, myvar)
The output is:
> mydf
class myvar
1 A1 -0.27256423
2 A1 1.98435540
3 A1 -1.38193488
4 B2 -0.20843958
5 B2 -0.08651873
6 B2 1.34213192
7 B2 1.32848845
8 B2 2.26547847
9 C3 -0.60518721
10 C3 1.98786369
11 D4 -1.16306103
12 D4 1.09872582
13 D4 0.15150502
14 D4 0.49064154
I would like this to look like:
A1 B2 C3 D4
-0.27256423 -0.20843958 -0.60518721 -1.16306103
1.98435540 -0.08651873 1.98786369 1.09872582
-1.38193488 1.34213192 0.15150502
1.32848845 0.49064154
2.26547847
Upvotes: 0
Views: 89
Reputation: 23574
Here is one way. Using spread()
, I put the data in a wider format. I picked up all complete cases in each column with lapply()
. I would like to give a credit to @Richard Scriven to the last step. This is something I learned from him. The last step adds NA to each vector. max(vapply(foo, length, 1L))
looks for the max length, which is 5 from $B2
. You make each list item with length of 5. For instance, $C3
has two elements. So, you add three NAs using sapply()
.
library(tidyr)
library(magrittr)
spread(mydf, class, myvar) %>%
lapply(., function(x) x[complete.cases(x)]) -> foo
as.data.frame(sapply(foo, `length<-`, max(vapply(foo, length, 1L))))
# A1 B2 C3 D4
#1 -0.2725642 -0.20843958 -0.6051872 -1.1630610
#2 1.9843554 -0.08651873 1.9878637 1.0987258
#3 -1.3819349 1.34213192 NA 0.1515050
#4 NA 1.32848845 NA 0.4906415
#5 NA 2.26547847 NA NA
EDIT
Seeing the comment of @djas, I did the following. I think this is better.
split(mydf, mydf$class) %>%
lapply(., function(x) x[,2]) -> foo
as.data.frame(sapply(foo, `length<-`, max(vapply(foo, length, 1L))))
Here is one more idea with dplyr
and tidyr
.
spread(mydf, class, myvar) %>%
mutate_each(funs(c(.[complete.cases(.)], .[!complete.cases(.)]))) %>%
filter(rowSums(., na.rm = TRUE) != 0)
DATA
mydf <- structure(list(class = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("A1", "B2", "C3", "D4"
), class = "factor"), myvar = c(-0.27256423, 1.9843554, -1.38193488,
-0.20843958, -0.08651873, 1.34213192, 1.32848845, 2.26547847,
-0.60518721, 1.98786369, -1.16306103, 1.09872582, 0.15150502,
0.49064154)), .Names = c("class", "myvar"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14"))
Upvotes: 1
Reputation: 193517
Expanding on my comment, simply add a secondary ID (the index position of each of the values in "class") and use that as the LHS of the formula in dcast
.
library(splitstackshape)
set.seed(1) ## To make a reproducible example
class = c(rep("A1", 3), rep("B2", 5), rep("C3", 2), rep("D4", 4))
myvar = rnorm(14)
mydf = data.frame(class, myvar)
dcast.data.table(getanID(mydf, "class"), .id ~ class, value.var = "myvar")
# .id A1 B2 C3 D4
# 1: 1 -0.6264538 1.5952808 0.5757814 1.5117812
# 2: 2 0.1836433 0.3295078 -0.3053884 0.3898432
# 3: 3 -0.8356286 -0.8204684 NA -0.6212406
# 4: 4 NA 0.4874291 NA -2.2146999
# 5: 5 NA 0.7383247 NA NA
Upvotes: 2
Reputation: 996
In your case, because the classes have different lengths, you can use "..." to spit out the full list with NAs. If the NAs are a problem, I would say that @djas suggestion to split()
is your best bet.
library(reshape2)
class = c(rep("A1", 3), rep("B2", 5), rep("C3", 2), rep("D4", 4))
myvar = rnorm(14)
mydf = data.frame(class, myvar)
dcast(mydf,myvar~...)
myvar A1 B2 C3 D4
1 -2.66688596 NA NA NA -2.66688596
2 -1.65370213 NA -1.65370213 NA NA
3 -1.53464694 -1.5346469 NA NA NA
4 -1.34557734 NA -1.34557734 NA NA
5 -0.92107697 NA NA NA -0.92107697
6 -0.85066517 -0.8506652 NA NA NA
7 -0.23682480 NA -0.23682480 NA NA
8 -0.02716902 NA NA -0.02716902 NA
9 0.06063714 NA 0.06063714 NA NA
10 0.07434025 NA NA NA 0.07434025
11 0.25034532 NA NA NA 0.25034532
12 0.70988347 NA 0.70988347 NA NA
13 1.66455350 NA NA 1.66455350 NA
14 2.61991105 2.6199110 NA NA NA
Upvotes: 0