Shaxi Liver
Shaxi Liver

Reputation: 1120

Each cluster of data frame on separate page of pdf file

I will use mtcars data frame as an example. I would like to print two columns of the data frame on each page of pdf file depending on the cluster which they belong.

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3

Let's assume that column carb is the cluster column. As you see we have 4 different clusters so what I would like to do is to put the output of two columns (starting from the first one) on separate page in pdf.

First page should look like:

  1. Datsun 710 22.8
  2. Hornet 4 Drive 21.4
  3. Valiant 18.1

Second page should look like:

  1. Hornet Sportabout 18.7
  2. Merc 240D 24.4
  3. Merc 230 22.8

Third page should look like:

  1. Merc 450SE 16.4

And I believe that you already know how the output of the last page should look like. Can you help me with that ? My data frame contains in total around 250 clusters.

Upvotes: 2

Views: 1238

Answers (2)

Christoph
Christoph

Reputation: 696

In addition to the @baptiste's answer here is a solution using rmarkdown. In order to print each table on a new pdf page you have to set results='asis'. With cat("\n\n\\newpage") and cat("\n\n\##") you can then dynamically create new pages and headers. knitr::kable() provides a nice table output.

---
title: "mtcars_clusters"
author: "Your name"
date: "3 August 2015"
output: pdf_document
---

```{r, echo=FALSE, results='asis'}
DAT <- lapply(sort(unique(mtcars$carb)), function(cluster) {
      data <- subset(mtcars, carb == cluster)
      cat(paste("\n\n## Cluster", cluster))
      print(knitr::kable(data))
      cat("\n\n\\newpage")
})
```

EDIT: Example with a larger data set

With respect to your comment on @baptiste's answer this markdown approach also handles larger data sets better. Here is an example with the ChickWeight dataset using ChickWeight$Diet as cluster variable:

---
title: "ChickWeight_clusters"
author: "Your name"
date: "3 August 2015"
output: pdf_document
---

```{r, echo=FALSE, results='asis'}
DAT <- lapply(sort(unique(ChickWeight$Diet)), function(cluster) {
  data <- subset(ChickWeight, Diet == cluster)
  cat(paste("\n\n## Cluster", cluster))
  print(knitr::kable(data))
  cat("\n\n\\newpage")
})
```

The output table is automatically split between pages so you should see all rows. Also, if you only want to print specific columns just subset data within print(knitr::kable(data)) accordingly.

Upvotes: 2

baptiste
baptiste

Reputation: 77096

grid.newpage() forces each table to appear on a new page,

pdf("multipage.pdf")
lapply(split(mtcars, mtcars$carb), function(d) {
  grid::grid.newpage()
  gridExtra::grid.table(d)
  }
  )
dev.off()

Upvotes: 4

Related Questions