lizzie
lizzie

Reputation: 606

Using dcast to reshape my data

I have been trying to use dcast(). I have this example:

class = c(rep("A1", 3), rep("B2", 5), rep("C3", 2), rep("D4", 4))
myvar = rnorm(14)
mydf = data.frame(class, myvar)

The output is:

> mydf
   class       myvar
1     A1 -0.27256423
2     A1  1.98435540
3     A1 -1.38193488
4     B2 -0.20843958
5     B2 -0.08651873
6     B2  1.34213192
7     B2  1.32848845
8     B2  2.26547847
9     C3 -0.60518721
10    C3  1.98786369
11    D4 -1.16306103
12    D4  1.09872582
13    D4  0.15150502
14    D4  0.49064154

I would like this to look like:

A1              B2           C3           D4
-0.27256423  -0.20843958   -0.60518721   -1.16306103
1.98435540   -0.08651873   1.98786369    1.09872582
-1.38193488  1.34213192                  0.15150502
             1.32848845                  0.49064154
             2.26547847

Upvotes: 0

Views: 89

Answers (3)

jazzurro
jazzurro

Reputation: 23574

Here is one way. Using spread(), I put the data in a wider format. I picked up all complete cases in each column with lapply(). I would like to give a credit to @Richard Scriven to the last step. This is something I learned from him. The last step adds NA to each vector. max(vapply(foo, length, 1L)) looks for the max length, which is 5 from $B2. You make each list item with length of 5. For instance, $C3 has two elements. So, you add three NAs using sapply().

library(tidyr)
library(magrittr)

spread(mydf, class, myvar) %>%
lapply(., function(x) x[complete.cases(x)]) -> foo
as.data.frame(sapply(foo, `length<-`, max(vapply(foo, length, 1L))))

#          A1          B2         C3         D4
#1 -0.2725642 -0.20843958 -0.6051872 -1.1630610
#2  1.9843554 -0.08651873  1.9878637  1.0987258
#3 -1.3819349  1.34213192         NA  0.1515050
#4         NA  1.32848845         NA  0.4906415
#5         NA  2.26547847         NA         NA

EDIT

Seeing the comment of @djas, I did the following. I think this is better.

split(mydf, mydf$class) %>%
lapply(., function(x) x[,2]) -> foo
as.data.frame(sapply(foo, `length<-`, max(vapply(foo, length, 1L))))

Here is one more idea with dplyr and tidyr.

spread(mydf, class, myvar) %>%
mutate_each(funs(c(.[complete.cases(.)], .[!complete.cases(.)]))) %>%
filter(rowSums(., na.rm = TRUE) != 0)

DATA

mydf <- structure(list(class = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("A1", "B2", "C3", "D4"
), class = "factor"), myvar = c(-0.27256423, 1.9843554, -1.38193488, 
-0.20843958, -0.08651873, 1.34213192, 1.32848845, 2.26547847, 
-0.60518721, 1.98786369, -1.16306103, 1.09872582, 0.15150502, 
0.49064154)), .Names = c("class", "myvar"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14"))

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

Expanding on my comment, simply add a secondary ID (the index position of each of the values in "class") and use that as the LHS of the formula in dcast.

library(splitstackshape)
set.seed(1) ## To make a reproducible example
class = c(rep("A1", 3), rep("B2", 5), rep("C3", 2), rep("D4", 4))
myvar = rnorm(14)
mydf = data.frame(class, myvar)
dcast.data.table(getanID(mydf, "class"), .id ~ class, value.var = "myvar")
#    .id         A1         B2         C3         D4
# 1:   1 -0.6264538  1.5952808  0.5757814  1.5117812
# 2:   2  0.1836433  0.3295078 -0.3053884  0.3898432
# 3:   3 -0.8356286 -0.8204684         NA -0.6212406
# 4:   4         NA  0.4874291         NA -2.2146999
# 5:   5         NA  0.7383247         NA         NA

Upvotes: 2

chepyle
chepyle

Reputation: 996

In your case, because the classes have different lengths, you can use "..." to spit out the full list with NAs. If the NAs are a problem, I would say that @djas suggestion to split() is your best bet.

library(reshape2)
class = c(rep("A1", 3), rep("B2", 5), rep("C3", 2), rep("D4", 4))
myvar = rnorm(14)
mydf = data.frame(class, myvar)
dcast(mydf,myvar~...)

         myvar         A1          B2          C3          D4
1  -2.66688596         NA          NA          NA -2.66688596
2  -1.65370213         NA -1.65370213          NA          NA
3  -1.53464694 -1.5346469          NA          NA          NA
4  -1.34557734         NA -1.34557734          NA          NA
5  -0.92107697         NA          NA          NA -0.92107697
6  -0.85066517 -0.8506652          NA          NA          NA
7  -0.23682480         NA -0.23682480          NA          NA
8  -0.02716902         NA          NA -0.02716902          NA
9   0.06063714         NA  0.06063714          NA          NA
10  0.07434025         NA          NA          NA  0.07434025
11  0.25034532         NA          NA          NA  0.25034532
12  0.70988347         NA  0.70988347          NA          NA
13  1.66455350         NA          NA  1.66455350          NA
14  2.61991105  2.6199110          NA          NA          NA

Upvotes: 0

Related Questions