Lee Walston
Lee Walston

Reputation: 35

For Loop Across Specific Column Range in R

I have a wide data frame consisting of 1000 rows and over 300 columns. The first 2 columns are GroupID and Categorical fields. The remaining columns are all continuous numeric measurements. What I would like to do is loop through a specific range of these columns in R, beginning with the first numeric column (column #3). For example, loop through columns 3:10. I would also like to retain the column names in the loop. I've started with the following code using

for(i in 3:ncol(df)){
  print(i)
} 

But this includes all columns to the right of column #3 (not the range 3:10), and this does not identify column names. Can anyone help get me started on this loop so I can specify the column range and also retain column names? TIA!

Side Note: I've used tidyr to gather the data frame in long format. That works, but I've found it makes my data frame very large and therefore eats a lot of time and memory in my loop.

Upvotes: 0

Views: 2617

Answers (2)

Dealec
Dealec

Reputation: 287

You can keep column names by feeding them into an lapply function, here's an example with the iris dataset:

  lapply(names(iris)[2:4], function(columntoplot){

   df <- data.frame(datatoplot=iris[[columntoplot]])
  
   graphname <- columntoplot
  
    ggplot(df, aes(x = datatoplot)) +
    geom_histogram() +
    ggtitle(graphname)
  
  ggsave(filename = paste0(graphname, ".png"), width = 4, height = 4)
  
})

In the lapply function, you create a new dataset comprising one column (note the double brackets). You can then plot and optionally save the output within the function (see ggsave line). You're then able to use the column name as the plot title as well as the file name.

Upvotes: 0

Duck
Duck

Reputation: 39595

As long as you do not include your data, I created a similar dummy data (1000 rows and 302 columns, 2 id vars ) in order to show you how to select columns, and prepare for plot:

library(reshape2)
library(ggplot2)
set.seed(123)
#Dummy data
Numvars <- as.data.frame(matrix(rnorm(1000*300),nrow = 1000,ncol = 300))
vec1 <- 1:1000
vec2 <- rep(paste0('class',1:5),200)
IDs <- data.frame(vec1,vec2,stringsAsFactors = F)
#Bind data
Data <- cbind(IDs,Numvars)
#Select vars (in your case 10 initial vars)
df <- Data[,1:12]
#Prepare for plot
df.melted <- melt(data = df,id.vars = c('vec1','vec2'))
#Plot
ggplot(df.melted,aes(x=vec1,y=value,group=variable,color=variable))+
  geom_line()+
  facet_wrap(~vec2)

You will end up with a plot like this:

enter image description here

I hope this helps.

Upvotes: 1

Related Questions