mathlete
mathlete

Reputation: 6692

How to convert data.frame to (flat) matrix?

How can I convert the data.frame below to a matrix as given? the first two columns of the data.frame contain the row variables, all combinations of the other columns (except the one containing the values) determine the columns. Ideally, I'm looking for a solution that does not require further packages (so no reshape2 solution). Also, no ftable solution.

(df <- data.frame(c1=rep(c(1, 2), each=8), c2=rep(c(1, 2, 1, 2), each=4),
                  gr=rep(c(1, 2), 8), subgr=rep(c(1,2), 4, each=2), val=1:16) )

c1 c2 gr1.subgr1 gr1.subgr2 gr2.subgr1 gr2.subgr2
1  1   1          3          2          4
1  2   5          7          6          8
2  1   9         11         10         12
2  2  13         15         14         16

Upvotes: 1

Views: 1622

Answers (2)

IRTFM
IRTFM

Reputation: 263411

Use an interaction variable to construct the groups:

newdf <- reshape(df, idvar=1:2, direction="wide", 
            timevar=interaction(df$gr,df$subgr) , 
            v.names="val", 
            drop=c("gr","subgr") ) 
names(newdf)[3:6] <- c("gr1.subgr1", "gr1.subgr2", "gr2.subgr1",  "gr2.subgr2")
 newdf
   c1 c2 gr1.subgr1 gr1.subgr2 gr2.subgr1 gr2.subgr2
1   1  1          1          2          3          4
5   1  2          5          6          7          8
9   2  1          9         10         11         12
13  2  2         13         14         15         16

Upvotes: 4

Chase
Chase

Reputation: 69201

Alright - this looks like it does mostly what you want. From reading the help file, this seems like it should do what you want:

reshape(df, idvar = c("c1", "c2"), timevar = c("gr", "subgr")
        , direction = "wide")
   c1 c2 val.c(1, 2, 1, 2) val.c(1, 1, 2, 2)
1   1  1                NA                NA
5   1  2                NA                NA
9   2  1                NA                NA
13  2  2                NA                NA

I can't fully explain why it shows up with NA values. However, maybe this bit from the help page explains:

timevar 
the variable in long format that differentiates multiple records from the same 
group or individual. If more than one record matches, the first will be taken.

I initially took that to mean that R would use it's partial matching capabilities if there was an ambiguity in the column names you gave it, but maybe not? Next, I tried combining gr and subgr into a single column:

df$newcol <- with(df, paste("gr.", gr, "subgr.", subgr, sep = ""))

And let's try this again:

reshape(df, idvar = c("c1", "c2"), timevar = "newcol"
        , direction = "wide", drop= c("gr","subgr"))

   c1 c2 val.gr.1subgr.1 val.gr.2subgr.1 val.gr.1subgr.2 val.gr.2subgr.2
1   1  1               1               2               3               4
5   1  2               5               6               7               8
9   2  1               9              10              11              12
13  2  2              13              14              15              16

Presto! I can't explain or figure out how to make it not append val. to the column names, but I'll leave you to figure that out on your own. I'm sure it's on the help page somewhere. It also put the groups in a different order than you requested, but the data seems to be right.

FWIW, here's a solution with reshape2

> dcast(c1 + c2 ~ gr + subgr, data = df, value.var = "val")
  c1 c2 1_1 1_2 2_1 2_2
1  1  1   1   3   2   4
2  1  2   5   7   6   8
3  2  1   9  11  10  12
4  2  2  13  15  14  16

Though you still have to clean up column names.

Upvotes: 2

Related Questions