Splitting a dataframe by group and printing group-specific rows to individual HTML files using pander and rapport

Question

Say I have a tall dataframe with many rows per group, like so:

df <- data.frame(group = factor(rep(c("a","b","c"), each = 5)),
                 v1    = sample(1:100, 15, replace = TRUE),
                 v2    = sample(1:100, 15, replace = TRUE),
                 v3    = sample(1:100, 15, replace = TRUE))

What I want to do is split df into length(levels(df$group)) separate dataframes, e.g.,

df_a <- df[df$group=="a",]; df_b <- df[df$group == "b",] ; ...

And then print each dataframe in a separate HTML/PDF/DOCX file (probably using Rmarkdown and knitr).

I want to do this because I have a large dataframe and want to create a personalized report for each group a, b, c, etc. Thanks.

Update (11/18/14)

Following @daroczig 's advice in this thread and another thread, I attempted to make my own template that would simply print a nicely formatted table of all columns and rows per group to substitute into the "correlations" template call in the original sapply() function. I want to make my own template rather than just printing the nice table (e.g., the answer @Thomas graciously provided) because I'd like to build additional customization into the template once the simple printing works. Anyway, I've certainly butchered it:



### Records received up to present for Group <%= eachgroup %>

<%=
pandoc.table(df[df$group == eachgroup, ])
%>

Then, after saving that as groupreport.rapport in my working directory, I wrote the following R code, modeled after @daroczig's response:

allgroups <- unique(df$group)

library(rapport)


for (eachstate in allstates) {
  rapport.docx("FILEPATHHERE", eachgroup = eachgroup)
}

I received the error:

Error in openFileInOS(f.out) : File not found!

I'm not sure what happened. I see from the pander documentation that this means it's looking for a system file, but that doesn't mean much to me. Anyway, this error doesn't get at the root of the problem, which is 1) what should go in the input section of the custom template YAML header, and 2) which R code should go in the rapport template vs. in the R script.

I realize I may be making a number of errors that reveal my lack of experience with rapport and pander. Thanks for your patience!

N.B.:

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.8       dplyr_0.3.0.2   rapport_0.51    yaml_2.1.13     pander_0.5.1
plyr_1.8.1          lattice_0.20-29

loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1      digest_0.6.4   evaluate_0.5.5 formatR_1.0    grid_3.1.2    
 [7] lazyeval_0.1.9 magrittr_1.0.1 parallel_3.1.2 Rcpp_0.11.3    reshape_0.8.5  stringr_0.6.2 
[13] tools_3.1.2

daroczig · Accepted Answer

A slightly off-topic, but still R/markdown one-liner for separate reports with report templates:

> library(rapport)
> sapply(levels(df$group), function(g) rapport.html('correlations', data = df[df$group == g, ], vars = c('v1', 'v2', 'v3')))
Exported to */tmp/RtmpYyRLjf/rapport-correlations-1-0.[md|html]* under 0.683 seconds.
Exported to */tmp/RtmpYyRLjf/rapport-correlations-2-0.[md|html]* under 0.888 seconds.
Exported to */tmp/RtmpYyRLjf/rapport-correlations-3-0.[md|html]* under 1.063 seconds.

The rapport package can run (predefined or custom) report templates on any (sub)dataset in markdown, then export it to HTML/docx/PDF/other formats. For a quick demo, I've uploaded the resulting documents:

Splitting a dataframe by group and printing group-specific rows to individual HTML files using pander and rapport

Answers (2)

Related Questions