Reputation: 17090
Here is the problem. There is a matrix with N rows and C columns, and two factors: ids
and group
, both of length N. For example:
m <- matrix( 1:25, nrow= 5, byrow= T )
id <- factor( c( "A", "A", "A", "B", "B" ) )
group <- factor( c( "a", "b", "c", "a", "c" ) )
Not all combinations of factors are present, but each combination of the factors is present only once. The task is to transform the matrix m
in such a way that it has length( levels( id ) )
rows and length( levels( group ) ) * C
columns. In other words, create a matrix where each variable corresponds to a combination between the original column and all possible levels of factor group
. Missing values (for non-existent combinations of id and group) are replaced by NA's. Here is the desired output of the above example:
a.1 a.2 a.3 a.4 a.5 b.1 b.2 b.3 b.4 b.5 c.1 c.2 c.3 c.4 c.5
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
B 16 17 18 19 20 NA NA NA NA NA 21 22 23 24 25
I wrote my own function, but it is terribly ineffective and I'm sure it duplicates the functionality of something extremely simple.
matrixReshuffle <- function( m, ids.row, factor.group ) {
nr <- nrow( m )
nc <- ncol( m )
if( is.null( colnames( m ) ) ) colnames( m ) <- 1:nc
ret <- NULL
for( id in levels( ids.row ) ) {
r <- c()
for( fg in levels( factor.group ) ) {
d <- m[ ids.row == id & factor.group == fg,, drop= F ]
if( nrow( d ) > 1 )
stop( sprintf( "Too many matches for ids.row= %s and factor.group= %s", id, fg ) )
else if( nrow( d ) < 1 ) {
r <- c( r, rep( NA, nc ) )
} else {
r <- c( r, d[1,] )
}
}
ret <- rbind( ret, r )
}
colnames( ret ) <- paste( rep( levels( factor.group ), each= nc ), rep( colnames( m ), length( levels( factor.group ) ) ), sep= "." )
rownames( ret ) <- levels( ids.row )
return( ret )
}
Upvotes: 3
Views: 1136
Reputation: 17090
This is a version of @Arun's response, slightly modified such that it is easier (for me) to understand. Also, I am always wary about replicating group factors; I found that in practice, this is one of the potential sources of a systematic error. Better directly to take over the id and group and let melt() do the job of replicating the factors. But those are just minor things.
# add the aggregating variables to the matrix, converted to data frame
df <- data.frame( m )
df$id <- id
df$group <- group
# reshape the data frame
require( reshape2 )
df.m <- melt( df, c( "id", "group" ) )
df <- dcast( df.m, id ~ group + variable )
# df has the required shape, but convert it back to a matrix
rownames( df ) <- df$id
df$id <- NULL
m.reshaped <- as.matrix( df )
Upvotes: 1
Reputation: 118799
Following @Aaron's suggestions:
Using melt
and acast
from reshape2
:
require(reshape2)
df <- as.data.frame(m)
names(df) <- seq_len(ncol(df))
df.m <- melt(df)
df.m$id <- rep(id, nrow(df.m)/length(id))
df.m$group <- rep(group, nrow(df.m)/length(group))
o <- acast(df.m, id ~ group+variable, value.var="value")
colnames(o) <- sub("_", ".", colnames(o))
# a.1 a.2 a.3 a.4 a.5 b.1 b.2 b.3 b.4 b.5 c.1 c.2 c.3 c.4 c.5
# A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# B 16 17 18 19 20 NA NA NA NA NA 21 22 23 24 25
You can convert this back to matrix.
Upvotes: 2
Reputation: 37754
For all the matrix indexing fans out there...
C <- ncol(m)
to.row <- matrix(rep(as.numeric(id), C), ncol=C)
to.col <- sweep(col(m),1,(as.numeric(group)-1)*C,`+`)
out <- array(dim=c(nlevels(id), nlevels(group)*C),
dimnames=list(levels(id), as.vector(t(outer(levels(group), 1:C, paste, sep=".")))))
out[cbind(as.vector(to.row), as.vector(to.col))] <- m
out
# a.1 a.2 a.3 a.4 a.5 b.1 b.2 b.3 b.4 b.5 c.1 c.2 c.3 c.4 c.5
# A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# B 16 17 18 19 20 NA NA NA NA NA 21 22 23 24 25
Upvotes: 2