Reputation: 1580
Say I have a tall dataframe with many rows per group, like so:
df <- data.frame(group = factor(rep(c("a","b","c"), each = 5)),
v1 = sample(1:100, 15, replace = TRUE),
v2 = sample(1:100, 15, replace = TRUE),
v3 = sample(1:100, 15, replace = TRUE))
What I want to do is split df
into length(levels(df$group))
separate dataframes, e.g.,
df_a <- df[df$group=="a",]; df_b <- df[df$group == "b",] ; ...
And then print each dataframe in a separate HTML/PDF/DOCX file (probably using Rmarkdown
and knitr
).
I want to do this because I have a large dataframe and want to create a personalized report for each group a
, b
, c
, etc. Thanks.
Update (11/18/14)
Following @daroczig 's advice in this thread and another thread, I attempted to make my own template that would simply print a nicely formatted table of all columns and rows per group to substitute into the "correlations"
template call in the original sapply()
function. I want to make my own template rather than just printing the nice table (e.g., the answer @Thomas graciously provided) because I'd like to build additional customization into the template once the simple printing works. Anyway, I've certainly butchered it:
<!--head
meta:
title: Sample Report
author: Nicapyke
description: This is a demo
packages: ~
inputs:
- name: eachgroup
class: character
standalone: TRUE
required: TRUE
head-->
### Records received up to present for Group <%= eachgroup %>
<%=
pandoc.table(df[df$group == eachgroup, ])
%>
Then, after saving that as groupreport.rapport
in my working directory, I wrote the following R code, modeled after @daroczig's response:
allgroups <- unique(df$group)
library(rapport)
for (eachstate in allstates) {
rapport.docx("FILEPATHHERE", eachgroup = eachgroup)
}
I received the error:
Error in openFileInOS(f.out) : File not found!
I'm not sure what happened. I see from the pander
documentation that this means it's looking for a system file, but that doesn't mean much to me. Anyway, this error doesn't get at the root of the problem, which is 1) what should go in the input
section of the custom template YAML
header, and 2) which R code should go in the rapport
template vs. in the R script.
I realize I may be making a number of errors that reveal my lack of experience with rapport
and pander
. Thanks for your patience!
N.B.:
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.8 dplyr_0.3.0.2 rapport_0.51 yaml_2.1.13 pander_0.5.1
plyr_1.8.1 lattice_0.20-29
loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1 digest_0.6.4 evaluate_0.5.5 formatR_1.0 grid_3.1.2
[7] lazyeval_0.1.9 magrittr_1.0.1 parallel_3.1.2 Rcpp_0.11.3 reshape_0.8.5 stringr_0.6.2
[13] tools_3.1.2
Upvotes: 1
Views: 1416
Reputation: 28682
A slightly off-topic, but still R/markdown one-liner for separate reports with report templates:
> library(rapport)
> sapply(levels(df$group), function(g) rapport.html('correlations', data = df[df$group == g, ], vars = c('v1', 'v2', 'v3')))
Exported to */tmp/RtmpYyRLjf/rapport-correlations-1-0.[md|html]* under 0.683 seconds.
Exported to */tmp/RtmpYyRLjf/rapport-correlations-2-0.[md|html]* under 0.888 seconds.
Exported to */tmp/RtmpYyRLjf/rapport-correlations-3-0.[md|html]* under 1.063 seconds.
The rapport package can run (predefined or custom) report templates on any (sub)dataset in markdown, then export it to HTML/docx/PDF/other formats. For a quick demo, I've uploaded the resulting documents:
Upvotes: 2
Reputation: 44575
You can do this with by
(or split
) and xtable
(from the xtable package). Here I create xtable
objects of each subset, and then loop over them to print them to file:
library('xtable')
s <- by(df, df$group, xtable)
for(i in seq_along(s)) print(s[[i]], file = paste0('df',names(s)[i],'.tex'))
If you use the stargazer package, you can get a nice summary of the dataframe instead of the dataframe itself in just one line:
library('stargazer')
by(df, df$group, stargazer, out = paste0('df',unique(df$group),'.tex'))
You should be able to easily include each of these files in, e.g., a PDF report. You could also use HTML markup using either xtable
or stargazer
.
Upvotes: 1