Reputation: 17767
I'm trying to put some matrices in a dataframe in R, something like :
m <- matrix(c(1,2,3,4), nrow=2, ncol=2)
df <- data.frame(id=1, mat=m)
But when I do that, I get a dataframe with 2 rows and 3 columns instead of a dataframe with 1 row and 2 columns.
Reading the documentation, I have to escape my matrix using I().
df <- data.frame(id=1, mat=I(m))
str(df)
'data.frame': 2 obs. of 2 variables:
$ id : num 1 1
$ mat: AsIs [1:2, 1:2] 1 2 3 4
As I understand it, the dataframe contains one row for each row of the matrix, and the mat field is a list of matrix column values.
Thus, how can I obtain a dataframe containing matrices ?
Thanks !
Upvotes: 6
Views: 15877
Reputation: 39667
To get a data.frame
with 1 row and 2 columns for the given example you have to put the matrix
inside a list
.
m <- matrix(1:4, 2)
x <- list2DF(list(id=1, mat=list(m)))
x
# id mat
#1 1 1, 2, 3, 4
str(x)
#'data.frame': 1 obs. of 2 variables:
# $ id : num 1
# $ mat:List of 1
# ..$ : int [1:2, 1:2] 1 2 3 4
y <- data.frame(id=1, mat=I(list(m)))
y
# id mat
#1 1 1, 2, 3, 4
str(y)
#'data.frame': 1 obs. of 2 variables:
# $ id : num 1
# $ mat:List of 1
# ..$ : int [1:2, 1:2] 1 2 3 4
# ..- attr(*, "class")= chr "AsIs"
To create a data.frame
with a column containing a matrix
, with the given data with 2 rows and 2 columns, directly when creating the data.frame
using I()
will be straight forward. An alternative without AsIs
could be to insert it later, as already shown by others.
m <- matrix(1:4, 2)
x <- data.frame(id=1, mat=I(m))
str(x)
'data.frame': 2 obs. of 2 variables:
$ id : num 1 1
$ mat: 'AsIs' int [1:2, 1:2] 1 2 3 4
y <- data.frame(id=rep(1, nrow(m)))
y[["m"]] <- m
#y["m"] <- m #Alternative
#y[,"m"] <- m #Alternative
#y$m <- m #Alternative
str(y)
#'data.frame': 2 obs. of 2 variables:
# $ id: num 1 1
# $ m : int [1:2, 1:2] 1 2 3 4
z <- `[<-`(data.frame(id=rep(1, nrow(m))), , "mat", m)
str(z)
#'data.frame': 2 obs. of 2 variables:
# $ id : num 1 1
# $ mat: int [1:2, 1:2] 1 2 3 4
Alternatively the data can be stored in a list
.
m <- matrix(1:4, 2)
x <- list(id=1, mat=m)
x
#$id
#[1] 1
#
#$mat
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
str(x)
#List of 2
# $ id : num 1
# $ mat: int [1:2, 1:2] 1 2 3 4
Upvotes: 0
Reputation: 325
Data frames containing matrix columns do have their uses in specialized scenarios. These scenarios are cases when you have a whole vector of some variable for every observation in your data set. There are two cases that I have come across where this is common:
If you're working with data frames, there are a few obvious ways to handle this data that are both inefficient. I'll use the Bayesian case as an example:
Data frames with matrix columns are a very useful solution to this situation. The posterior stays in a matrix that has the same number of rows as the data frame. But that matrix only is recognized as a single "column" in the data frame, and referring to that column using df$mat will return the matrix. You can even use some dplyr functions like filtering to return the corresponding rows of the matrix, but this is a bit experimental.
The easiest method to create the matrix column is in two steps. First create the data frame without the matrix column, then add the matrix column with a simple assignment. I haven't found a 1-step solution to do this that doesn't involve I()
which changes the column type.
m <- matrix(c(1,2,3,4), nrow=2, ncol=2)
df <- data.frame(id = rep(1, nrow(m)))
df$mat <- m
names(df)
# [1] "id" "mat"
str(df)
# 'data.frame': 2 obs. of 2 variables:
# $ id : num 1 1
# $ mat: num [1:2, 1:2] 1 2 3 4
Upvotes: 4
Reputation: 105
I came across the same problem trying to understand the gasoline data in pls package. Used $
for the job.
First, lets create a matrix, lets call it spectra_mat, then a vector called response_var1.
spectra_mat = matrix(1:45, 9, 5)
response_var1 = seq(1:9)
Now we put the vector response_var1 in a new data frame - lets call it df.
df = data.frame(response_var1)
df$spectra = spectra_mat
To check,
str(df)
'data.frame': 9 obs. of 2 variables:
$ response_var1: int 1 2 3 4 5 6 7 8 9
$ spectra : int [1:9, 1:5] 1 2 3 4 5 6 7 8 9 10 ...
Upvotes: 5
Reputation: 5865
A much easier way to do this is to define the data frame with a placeholder for the matrix
m <- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)
df <- data.frame(id = 1, mat = rep(0, nrow(m)))
Then to assign the matrix. No need to play with the class of a list or to use an *apply()
function.
df$mat <- m
Upvotes: 5
Reputation: 226322
I find data.frames containing matrices mind-bendingly weird, but: the only way I know to achieve this is hidden in stats:::simulate.lm
Try this, poke through and see what's happening:
d <- data.frame(y=1:5,n=5)
g0 <- glm(cbind(y,n-y)~1,data=d,family=binomial)
debug(stats:::simulate.lm)
s <- simulate(g0,n=5)
This is the weird, back-door solution. Create a list, change its class to data.frame
, and then (this is required) set the names
and row.names
manually (if you don't do those final steps the data will still be in the object, but it will print out as though it had zero rows ...)
m1 <- matrix(1:10,ncol=2)
m2 <- matrix(5:14,ncol=2)
dd <- list(m1,m2)
class(dd) <- "data.frame"
names(dd) <- LETTERS[1:2]
row.names(dd) <- 1:5
dd
Upvotes: 7
Reputation: 29367
The result you got (2 rows x 3 columns) is what is to be expected from R, as it amounts to cbind
a vector (id
, with recycling) and a matrix (m
).
IMO, it would be better to use list
or array
(when dimensions agree, no mix of numeric and factors values allowed), if you really want to bind different data structures. Otherwise, just cbind
your matrix to an existing data.frame if both have the same number of rows will do the job. For example
x1 <- replicate(2, rnorm(10))
x2 <- replicate(2, rnorm(10))
x12l <- list(x1=x1, x2=x2)
x12a <- array(rbind(x1, x2), dim=c(10,2,2))
and the results reads
> str(x12l)
List of 2
$ x1: num [1:10, 1:2] -0.326 0.552 -0.675 0.214 0.311 ...
$ x2: num [1:10, 1:2] -0.164 0.709 -0.268 -1.464 0.744 ...
> str(x12a)
num [1:10, 1:2, 1:2] -0.326 0.552 -0.675 0.214 0.311 ...
Lists are easier to use if you plan to use matrix of varying dimensions, and providing they are organized in the same way (for rows) as an external data.frame you can subset them as easily. Here is an example:
df1 <- data.frame(grp=gl(2, 5, labels=LETTERS[1:2]),
age=sample(seq(25,35), 10, rep=T))
with(df1, tapply(x12l$x1[,1], list(grp, age), mean))
You can also use lapply
(for list) and apply
(for array) functions.
Upvotes: 1