Reputation: 1018
Documentation for readMat()
function says: "For the MAT v5 format, cell structures are read into R as a list structure."
This creates a problem here for me as I am not able to convert it back to the original table structure from the list object. In the original files I inherited, each row (rather than column) represents answers to different questionnaires (row1 = questionnaire1, row2 = questionnaire2, etc.), but the way readMat()
creates the list is vertically (by column), so my questionnaire items are basically all messed up.
Here's code to reproduce a simplified example for the desired output and original file appearance in the Matlab
cell
structure:
list1 <- list("2", "34", "17", NA, NA, NA)
list2 <- list("32", "43", NA, NA, NA, NA)
list3 <- list("C", "D", "A", "F", "G", "I")
list4 <- list("455", NA, NA, NA, NA, NA)
df <- data.frame()
df <- rbind(df,list1,list2,list3,list4)
colnames(df) <- NULL
rownames(df) <- NULL
df
This outputs the following (DESIRED OUTPUT/ORIGINAL MATLAB STRUCTURE):
1 2 34 17 <NA> <NA> <NA>
2 32 43 <NA> <NA> <NA> <NA>
3 C D A F G I
4 455 <NA> <NA> <NA> <NA> <NA>
So I can select by row instead of having a messed up order of observations. Note that I replaced the NULL
values with NA
for this example else I had an error while making the data frame.
However, to reproduce the outcome of importing in R
from Matlab
with readMat()
we need hefty code like this:
list1 <- list(matrix("2"))
list2 <- list(matrix("32"))
list3 <- list(matrix("C"))
list4 <- list(matrix("455"))
list5 <- list(matrix("34"))
list6 <- list(matrix("43"))
list7 <- list(matrix("D"))
list8 <- NULL
list9 <- list(matrix("17"))
list10 <- NULL
list11 <- list(matrix("A"))
list12 <- NULL
list13 <- NULL
list14 <- NULL
list15 <- list(matrix("F"))
list16 <- NULL
list17 <- NULL
list18 <- NULL
list19 <- list(matrix("G"))
list20 <- NULL
list21 <- NULL
list22 <- NULL
list23 <- list(matrix("I"))
list24 <- NULL
(mylist <- list(list1, list2, list3, list4, list5,
list6, list7, list8, list9, list10,
list11, list12, list13, list14, list15,
list16, list17, list18, list19, list20,
list21, list22, list23, list24))
Which outputs the following:
[[1]]
[[1]][[1]]
[,1]
[1,] "2"
[[2]]
[[2]][[1]]
[,1]
[1,] "32"
[[3]]
[[3]][[1]]
[,1]
[1,] "C"
[[4]]
[[4]][[1]]
[,1]
[1,] "455"
[[5]]
[[5]][[1]]
[,1]
[1,] "34"
[[6]]
[[6]][[1]]
[,1]
[1,] "43"
[[7]]
[[7]][[1]]
[,1]
[1,] "D"
[[8]]
NULL
[[9]]
[[9]][[1]]
[,1]
[1,] "17"
[[10]]
NULL
[[11]]
[[11]][[1]]
[,1]
[1,] "A"
[[12]]
NULL
[[13]]
NULL
[[14]]
NULL
[[15]]
[[15]][[1]]
[,1]
[1,] "F"
[[16]]
NULL
[[17]]
NULL
[[18]]
NULL
[[19]]
[[19]][[1]]
[,1]
[1,] "G"
[[20]]
NULL
[[21]]
NULL
[[22]]
NULL
[[23]]
[[23]][[1]]
[,1]
[1,] "I"
[[24]]
NULL
So in other threads, most people said to unlist, but unlisting my list does not allow me to select questionnaires by row for instance (especially since NULL
values are not conserved in the dimensions when unlisting):
unlist(mylist)
[1] "2" "32" "C" "455" "34" "43" "D" "17" "A" "F" "G" "I"
You can see it's tidier but the items are not in the right order so it's hard to put them back into a data frame. Some said to transform into a matrix... which does not really resolve the problem:
matrix(unlist(mylist))
[,1]
[1,] "2"
[2,] "32"
[3,] "C"
[4,] "455"
[5,] "34"
[6,] "43"
[7,] "D"
[8,] "17"
[9,] "A"
[10,] "F"
[11,] "G"
[12,] "I"
I've tried other solutions from the threads to no avail, e.g.:
do.call(rbind.data.frame, mylist) # doesn't work
as.data.frame(matrix(unlist(mylist),nrow=length(mylist),byrow=TRUE)) # doesn't work
Here are some related threads: 1, 2, 3, 4, 5, 6, 7, and 8.
Why is it necessary for readMat()
to import MAT v5
format cell structures as lists rather than data frames (it would save us so much trouble)?
I'm looking for a solution ideally in base R
to transform the readMat()
list object to a data frame, that I could automatize assuming I have thousands such files that I'm not going to edit, restructure, or save to a different format individually in Matlab, and assuming the number and location of NULL
values vary, as well as the length of each row (some questionnaires have more items than others). Thanks!
Upvotes: 1
Views: 885
Reputation: 1018
I still don't know why MAT v5
needs to import tables to lists, but I unexpectedly found a solution!
The function below will easily extract a specific row from that type of list, where list
is your list, row
is the row you want to extract, and nrow
is the total number of rows (given that you know these details):
matlab.row <- function(list,row,nrow) {
unlist(list[seq(row, length(list), nrow)]) # This will take every nth element starting from desired row
}
matlab.row(mylist,1,4)
"2" "34" "17"
matlab.row(mylist,2,4)
"32" "43"
matlab.row(mylist,3,4)
"C" "D" "A" "F" "G" "I"
matlab.row(mylist,4,4)
"455"
In order to get the full data frame, I had to tweak the function some more, where list
is your list, max.len
is the length of the longest row (max number of items), and nrow
is your total number of rows:
matlab.df <- function(list,max.len,nrow) {
matlab.row <- function(list,row,nrow) { # We reuse the function we just made earlier
unlist(list[seq(row, length(list), nrow)])
}
listA <- vector('list', nrow) # Precreates list
for (i in 1:nrow) {
listA[i] <- list(c(matlab.row(list,i,nrow), # Combines output from last function to NAs on next line
rep(NA, max.len - length(matlab.row(list,i,nrow))))) # Fills the remaining columns with NAs (very important part!)
}
df <- do.call(rbind,listA) # Binds elements together from the list we created as rows
df # Prints dataframe
}
matlab.df(list = mylist, max.len = 6, nrow = 4)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "2" "34" "17" NA NA NA
[2,] "32" "43" NA NA NA NA
[3,] "C" "D" "A" "F" "G" "I"
[4,] "455" NA NA NA NA NA
I found the solution thanks to a combination of these threads: 1, 2, 3, and 4.
Upvotes: 1