BenH
BenH

Reputation: 167

How to loop through data sets to graph particular columns only?

The graph, I have down. The challenge is I have the _exact_same_code_ for graphing multiple data sets (rather, subsets of one LARGE data set), but I can't seem to get the looping code right to substitute the $ correctly.

Data sets, df1, df2, df3... of the form:

OBSDATE     REGION  AVG_RESP  P10  P90
2012-02-01  APAC    1.276     0.78 3.45
2012-02-01  EMEA    2.341     1.23 5.67
2012-02-02  APAC    1.343     0.89 3.21
2012-02-02  EMEA    2.473     1.37 5.98

The graph is more complex, but like this:

avgMx <- quantile(df1$P90,0.95)
ggplot(df1,aes(x=OBSDATE,y=AVG_RESP))+coord_cartesian(ylim=c(0,avgMx))+geom_ribbon(aes(ymin=P10,ymax=P90),fill="gray60",alpha=0.33)+geom_line(aes(x=OBSDATE,y=AVG_RESP),color="#007DB1",size=0.5)+facet_wrap(~REGION)

if I define a vector or list (both seem to fail with the same error messages) with the data set names I can't get the loop to work to find any descriptive values (like the quantile above or even a max!)

filenames <- c("df1","df2","df3")

I would like to get something like this to work

for (i in filenames) {
   quantile(i$AVG_RESP,0.95)
   max(i$AVG_RESP)
}

But I get errors about $ is invalid for atomic vectors. Upon investigation, that doesn't seem to yield any usable results.

So, I can get this to work:

max(df1$AVG_RESP) or max(df1['AVG_RESP'])

they both would output 2.473 from above. However, this doesn't fly:

for (i in pagesC) max(i['AVG_RESP'])

It does nothing. Changing it to this:

for (i in pagesC) print(max(i['AVG_RESP']))

Gives instances of NA.

I'm completely stuck. Any help would be tremendously appreciated!

EDIT: I fixed the data that was causing errors - should be reproducible now.

Upvotes: 0

Views: 6280

Answers (2)

Tyler Rinker
Tyler Rinker

Reputation: 109864

Your code isn't reproducible so this is my best guess at what you want:

df1 <- df2 <- df3 <- read.table(text="OBSDATE     REGION  AVG_RESP  P10  P90
2012-02-01  APAC    1.276     0.78 3.45
2012-02-01  EMEA    2.341     1.23 5.67
2012-02-02  APAC    1.343     0.89 3.21
2012-02-02  EMEA    2.473     1.37 5.98
2012-02-01  APAC    1.276     0.78 3.45
2012-02-01  EMEA    2.341     1.23 5.67
2012-02-02  APAC    1.343     0.89 3.21
2012-02-02  EMEA    2.473     1.37 5.98
2012-02-01  APAC    1.276     0.78 3.45
2012-02-01  EMEA    2.341     1.23 5.67
2012-02-02  APAC    1.343     0.89 3.21
2012-02-02  EMEA    2.473     1.37 5.98", header=TRUE)

info <- function(dataframe){
    c(quantile(dataframe$AVG_RESP,0.95), max(dataframe$AVG_RESP))
}

LIST <- list(df1, df2, df3)
lapply(LIST, info)   
#Or you may want to use sapply if you want it to return a matrix
sapply(LIST, info) 

R can use loops but this really isn't the R way of doing things.

Upvotes: 3

Brian Diggs
Brian Diggs

Reputation: 58825

i is a character string; you want the object which has the name that is held in i. That is the get() function. (untested since what you gave was not reproducible.)

for (filename in filenames) {
   i <- get(filename)
   quantile(i$AVG_RESP,0.95)
   max(i$AVG_RESP)
}

This is probably not the best way to solve your problem, though. Putting all the data frames in a list and looping over that list with lapply might be a better approach (what Tyler described in his answer). Furthermore, if these are subsets you've made of a bigger, single data frame you have, then an even better approach would be to use something from the plyr package to define how to split the big data frame up and what to do with each part.

Upvotes: 3

Related Questions