Reputation: 1120
I will use mtcars
data frame as an example. I would like to print two columns of the data frame on each page of pdf file depending on the cluster which they belong.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Let's assume that column carb
is the cluster column. As you see we have 4 different clusters so what I would like to do is to put the output of two columns (starting from the first one) on separate page in pdf.
First page should look like:
Second page should look like:
Third page should look like:
And I believe that you already know how the output of the last page should look like. Can you help me with that ? My data frame contains in total around 250 clusters.
Upvotes: 2
Views: 1238
Reputation: 696
In addition to the @baptiste's answer here is a solution using rmarkdown. In order to print each table on a new pdf page you have to set results='asis'
. With cat("\n\n\\newpage")
and cat("\n\n\##")
you can then dynamically create new pages and headers. knitr::kable()
provides a nice table output.
---
title: "mtcars_clusters"
author: "Your name"
date: "3 August 2015"
output: pdf_document
---
```{r, echo=FALSE, results='asis'}
DAT <- lapply(sort(unique(mtcars$carb)), function(cluster) {
data <- subset(mtcars, carb == cluster)
cat(paste("\n\n## Cluster", cluster))
print(knitr::kable(data))
cat("\n\n\\newpage")
})
```
EDIT: Example with a larger data set
With respect to your comment on @baptiste's answer this markdown approach also handles larger data sets better. Here is an example with the ChickWeight
dataset using ChickWeight$Diet
as cluster variable:
---
title: "ChickWeight_clusters"
author: "Your name"
date: "3 August 2015"
output: pdf_document
---
```{r, echo=FALSE, results='asis'}
DAT <- lapply(sort(unique(ChickWeight$Diet)), function(cluster) {
data <- subset(ChickWeight, Diet == cluster)
cat(paste("\n\n## Cluster", cluster))
print(knitr::kable(data))
cat("\n\n\\newpage")
})
```
The output table is automatically split between pages so you should see all rows. Also, if you only want to print specific columns just subset data
within print(knitr::kable(data))
accordingly.
Upvotes: 2
Reputation: 77096
grid.newpage() forces each table to appear on a new page,
pdf("multipage.pdf")
lapply(split(mtcars, mtcars$carb), function(d) {
grid::grid.newpage()
gridExtra::grid.table(d)
}
)
dev.off()
Upvotes: 4