mcjudd
mcjudd

Reputation: 1580

Splitting a dataframe by group and printing group-specific rows to individual HTML files using pander and rapport

Say I have a tall dataframe with many rows per group, like so:

df <- data.frame(group = factor(rep(c("a","b","c"), each = 5)),
                 v1    = sample(1:100, 15, replace = TRUE),
                 v2    = sample(1:100, 15, replace = TRUE),
                 v3    = sample(1:100, 15, replace = TRUE))

What I want to do is split df into length(levels(df$group)) separate dataframes, e.g.,

df_a <- df[df$group=="a",]; df_b <- df[df$group == "b",] ; ...

And then print each dataframe in a separate HTML/PDF/DOCX file (probably using Rmarkdown and knitr).

I want to do this because I have a large dataframe and want to create a personalized report for each group a, b, c, etc. Thanks.

Update (11/18/14)

Following @daroczig 's advice in this thread and another thread, I attempted to make my own template that would simply print a nicely formatted table of all columns and rows per group to substitute into the "correlations" template call in the original sapply() function. I want to make my own template rather than just printing the nice table (e.g., the answer @Thomas graciously provided) because I'd like to build additional customization into the template once the simple printing works. Anyway, I've certainly butchered it:

<!--head
meta:
  title: Sample Report
  author: Nicapyke
  description: This is a demo
  packages: ~
inputs:
- name: eachgroup
  class: character
  standalone: TRUE
  required: TRUE
head-->

### Records received up to present for Group <%= eachgroup %>

<%=
pandoc.table(df[df$group == eachgroup, ])
%>

Then, after saving that as groupreport.rapport in my working directory, I wrote the following R code, modeled after @daroczig's response:

allgroups <- unique(df$group)

library(rapport)


for (eachstate in allstates) {
  rapport.docx("FILEPATHHERE", eachgroup = eachgroup)
}

I received the error:

Error in openFileInOS(f.out) : File not found!

I'm not sure what happened. I see from the pander documentation that this means it's looking for a system file, but that doesn't mean much to me. Anyway, this error doesn't get at the root of the problem, which is 1) what should go in the input section of the custom template YAML header, and 2) which R code should go in the rapport template vs. in the R script.

I realize I may be making a number of errors that reveal my lack of experience with rapport and pander. Thanks for your patience!

N.B.:

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.8       dplyr_0.3.0.2   rapport_0.51    yaml_2.1.13     pander_0.5.1
plyr_1.8.1          lattice_0.20-29

loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1      digest_0.6.4   evaluate_0.5.5 formatR_1.0    grid_3.1.2    
 [7] lazyeval_0.1.9 magrittr_1.0.1 parallel_3.1.2 Rcpp_0.11.3    reshape_0.8.5  stringr_0.6.2 
[13] tools_3.1.2   

Upvotes: 1

Views: 1416

Answers (2)

daroczig
daroczig

Reputation: 28682

A slightly off-topic, but still R/markdown one-liner for separate reports with report templates:

> library(rapport)
> sapply(levels(df$group), function(g) rapport.html('correlations', data = df[df$group == g, ], vars = c('v1', 'v2', 'v3')))
Exported to */tmp/RtmpYyRLjf/rapport-correlations-1-0.[md|html]* under 0.683 seconds.
Exported to */tmp/RtmpYyRLjf/rapport-correlations-2-0.[md|html]* under 0.888 seconds.
Exported to */tmp/RtmpYyRLjf/rapport-correlations-3-0.[md|html]* under 1.063 seconds.

The rapport package can run (predefined or custom) report templates on any (sub)dataset in markdown, then export it to HTML/docx/PDF/other formats. For a quick demo, I've uploaded the resulting documents:

Upvotes: 2

Thomas
Thomas

Reputation: 44575

You can do this with by (or split) and xtable (from the xtable package). Here I create xtable objects of each subset, and then loop over them to print them to file:

library('xtable')
s <- by(df, df$group, xtable)
for(i in seq_along(s)) print(s[[i]], file = paste0('df',names(s)[i],'.tex'))

If you use the stargazer package, you can get a nice summary of the dataframe instead of the dataframe itself in just one line:

library('stargazer')
by(df, df$group, stargazer, out = paste0('df',unique(df$group),'.tex'))

You should be able to easily include each of these files in, e.g., a PDF report. You could also use HTML markup using either xtable or stargazer.

Upvotes: 1

Related Questions