armipunk
armipunk

Reputation: 458

How to convert lapply output to a single matrix in R

I have a list of data frames, organized by year. I am using lapply to get the summary for a single variable in each data frame. The output follows the list and gives a summary for each year, one by one. However, I want the output in the form of a single table with years for rows. How do I do this? An example using the iris dataset shows my problem:

x <- split(iris$Sepal.Length, iris$Species)
lapply(x, summary)

And the output is:

$setosa
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   4.300   4.800   5.000   5.006   5.200   5.800  

Similarly for the other two.

I want the output organized as a single table like with:

> sapply(x, summary)
        setosa versicolor virginica
Min.     4.300      4.900     4.900
1st Qu.  4.800      5.600     6.225
Median   5.000      5.900     6.500
Mean     5.006      5.936     6.588
3rd Qu.  5.200      6.300     6.900
Max.     5.800      7.000     7.900

But with setosa, versicolor, virginica (or years in my case) on the left and Min... Max up top. I can flip the axes around in ggplot, but reading the table as-is is more intuitive with the years on the left. I came across a number of discussions about converting lapply output but the ones I came across were all measuring a single stat like mean or median. Thanks.

Upvotes: 0

Views: 1656

Answers (2)

Uwe
Uwe

Reputation: 42564

If you have a large data.frame, I recommend not to split it into pieces but to use data.table for grouping by year. With the iris data set this could be done along

library(data.table)
setDT(copy(iris))[, as.list(summary(Sepal.Length)), by = Species]
#      Species Min. 1st Qu. Median  Mean 3rd Qu. Max.
#1:     setosa  4.3   4.800    5.0 5.006     5.2  5.8
#2: versicolor  4.9   5.600    5.9 5.936     6.3  7.0
#3:  virginica  4.9   6.225    6.5 6.588     6.9  7.9

as.list() ensures the output of summary() appears column-wise as requested.

The result is a data.table (not a matrix) which can be used directly in a subsequent ggplot() call.

Note that copy(iris) is only required here because the iris data set is locked to prevent modifying its variable bindings. With your own data.frame df you would simply use setDT(df) to coerce to data.table without copying.

Add-on

The OP mentioned that he uses the result for plotting with ggplot2. Now, ggplot2 works best when data are provided in long format. Reshaping a data.table from wide to long format can be conveniently done with melt()

wideDT <- setDT(copy(iris))[, summary(Sepal.Length), by = Species]
longDT <- melt(wideDT, id.vars = "Species")
longDT
#       Species variable value
# 1:     setosa     Min. 4.300
# 2: versicolor     Min. 4.900
# 3:  virginica     Min. 4.900
# 4:     setosa  1st Qu. 4.800
# 5: versicolor  1st Qu. 5.600
# 6:  virginica  1st Qu. 6.225
# 7:     setosa   Median 5.000
# 8: versicolor   Median 5.900
# 9:  virginica   Median 6.500
#10:     setosa     Mean 5.006
#11: versicolor     Mean 5.936
#12:  virginica     Mean 6.588
#13:     setosa  3rd Qu. 5.200
#14: versicolor  3rd Qu. 6.300
#15:  virginica  3rd Qu. 6.900
#16:     setosa     Max. 5.800
#17: versicolor     Max. 7.000
#18:  virginica     Max. 7.900

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99361

This seems like a good time to use by(). It eliminates the need for the call to split(), is all done in one line, and returns a matrix.

with(iris, do.call(rbind, by(Sepal.Length, Species, summary)))
#            Min. 1st Qu. Median  Mean 3rd Qu. Max.
# setosa      4.3   4.800    5.0 5.006     5.2  5.8
# versicolor  4.9   5.600    5.9 5.936     6.3  7.0
# virginica   4.9   6.225    6.5 6.588     6.9  7.9

If you still wish to use manual split-apply-combine method, then it would be

do.call(rbind, lapply(x, summary))

Upvotes: 1

Related Questions