Reputation: 6657

arrange multiple graphs using a for loop in ggplot2

I want to produce a pdf which shows multiple graphs, one for each NetworkTrackingPixelId. I have a data frame similar to this:

> head(data)
  NetworkTrackingPixelId                           Name       Date Impressions
1                   2421                    Rubicon RTB 2014-02-16      168801
2                   2615                     Google RTB 2014-02-16     1215235
3                   3366                      OpenX RTB 2014-02-16      104419
4                   3606                   AppNexus RTB 2014-02-16      170757
5                   3947                   Pubmatic RTB 2014-02-16       68690
6                   4299            Improve Digital RTB 2014-02-16         701

I was thinking to use a script similar to the one below:

# create a vector which stores the NetworkTrackingPixelIds
tp <- data %.%
        group_by(NetworkTrackingPixelId) %.%
        select(NetworkTrackingPixelId)

# create a for loop to print the line graphs
for (i in tp) {
      print(ggplot(data[which(data$NetworkTrackingPixelId == i), ], aes(x = Date, y = Impressions)) + geom_point() + geom_line())
    }

I was expecting this command to produce many graphs, one for each NetworkTrackingPixelId. Instead the result is an unique graph which aggregate all the NetworkTrackingPixelIds.

Another thing I've noticed is that the variable tp is not a real vector.

> is.vector(tp)
[1] FALSE

Even if I force it..

tp <- as.vector(data %.%
        group_by(NetworkTrackingPixelId) %.%
        select(NetworkTrackingPixelId))
> is.vector(tp)
[1] FALSE
> str(tp)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1397 obs. of  1 variable:
 $ NetworkTrackingPixelId: int  2421 2615 3366 3606 3947 4299 4429 4786 6046 6286 ...
 - attr(*, "vars")=List of 1
  ..$ : symbol NetworkTrackingPixelId
 - attr(*, "drop")= logi TRUE
 - attr(*, "indices")=List of 63
  ..$ : int  24 69 116 162 205 253 302 351 402 454 ...
  ..$ : int  1 48 94 140 184 232 281 330 380 432 ...

[I've cut a bit this output]

 - attr(*, "group_sizes")= int  29 29 2 16 29 1 29 29 29 29 ...
 - attr(*, "biggest_group_size")= int 29
 - attr(*, "labels")='data.frame':  63 obs. of  1 variable:
  ..$ NetworkTrackingPixelId: int  8799 2615 8854 8869 4786 7007 3947 9109 9126 9137 ...
  ..- attr(*, "vars")=List of 1
  .. ..$ : symbol NetworkTrackingPixelId

Upvotes: 5

Answers (4)

user3389288

Reputation: 1024

I think you would be better off writing a function for plotting, then using lapply for every Network Tracking Pixel.

For example, your function might look like:

plot.function <- function(ntpid){
  sub = subset(dataset, dataset$networktrackingpixelid == ntpid)
  ggobj = ggplot(data=sub, aes(...)) + geom...
  ggsave(filename=sprintf("%s.pdf", ntpid))
}

It would be helpful for you to put a reproducible example, but I hope this works! Not sure about the vector issue though..

Cheers!

Upvotes: 0

jlhoward

Reputation: 59375

Unless I'm missing something, generating plots by a subsetting variable is very simple. You can use split(...) to split the original data into a list of data frames by NetworkTrackingPixelId, and then pass those to ggplot using lapply(...). Most of the code below is just to crate a sample dataset.

# create sample data
set.seed(1)
names <- c("Rubicon","Google","OpenX","AppNexus","Pubmatic")
dates <- as.Date("2014-02-16")+1:10
df <- data.frame(NetworkTrackingPixelId=rep(1:5,each=10),
                 Name=sample(names,50,replace=T),
                 Date=dates,
                 Impressions=sample(1000:10000,50))
# end create sample data

pdf("plots.pdf")
lapply(split(df,df$NetworkTrackingPixelId),
       function(gg) ggplot(gg,aes(x = Date, y = Impressions)) + 
          geom_point() + geom_line()+
          ggtitle(paste("NetworkTrackingPixelId:",gg$NetworkTrackingPixelId)))
dev.off()

This generates a pdf containing 5 plots, one for each NetworkTrackingPixelId.

Upvotes: 0

JBecker

Reputation: 814

I recently had a project that required producing a lot of individual pngs for each record. I found I got a huge speed up doing some pretty simple parallelization. I am not sure if this is more performant than the dplyr or data.table technique but it may be worth trying. I saw a huge speed bump:

require(foreach)
require(doParallel)
workers <- makeCluster(4)
registerDoParallel(workers) 
foreach(i = seq(1, length(mtcars$gear)), .packages=c('ggplot2')) %dopar% {
  j <- qplot(wt, mpg, data = mtcars[i,])
  png(file=paste(getwd(), '/images/',mtcars[i, c('gear')],'.png', sep=''))
  print(j)
  dev.off()
}

Upvotes: 1

Ramnath

Reputation: 55695

Since I don't have your dataset, I will use the mtcars dataset to illustrate how to do this using dplyr and data.table. Both packages are the finest examples of the split-apply-combine paradigm in rstats. Let me explain:

Step 1 Split data by gear

dplyr uses the function group_by
data.table uses argument by

Step 2: Apply a function

dplyr uses do to which you can pass a function that uses the pieces x.
data.table interprets the variables to the function in context of each piece.

Step 3: Combine

There is no combine step here, since we are saving the charts created to file.

library(dplyr)
mtcars %.%
  group_by(gear) %.%
  do(function(x){ggsave(
    filename = sprintf("gear_%s.pdf", unique(x$gear)), qplot(wt, mpg, data = x)
  )})

library(data.table)
mtcars_dt = data.table(mtcars)
mtcars_dt[,ggsave(
  filename = sprintf("gear_%s.pdf", unique(gear)), qplot(wt, mpg)),
  by = gear
]

UPDATE: To save all files into one pdf, here is a quick solution.

plots = mtcars %.%
  group_by(gear) %.%
  do(function(x) {
    qplot(wt, mpg, data = x)
  })

pdf('all.pdf')
invisible(lapply(plots, print))
dev.off()

Upvotes: 13

arrange multiple graphs using a for loop in ggplot2

Answers (4)

Related Questions