Ndharwood
Ndharwood

Reputation: 143

Loop through and plot columns of two identical dataframes

I have two dataframes I'd like to plot against each other:

> df1 <- data.frame(HV = c(3,3,3), NAtlantic850t = c(0.501, 1.373, 1.88), AO = c(-0.0512, 0.2892, 0.0664))

> df2 <- data.frame(HV = c(3,3,2), NAtlantic850t = c(1.2384, 1.3637, -0.0332), AO = c(-0.5915, -0.0596, -0.8842))

They're identical, I'd like to plot them column vs column (e.g. df1$HV, df2$HV) - loop through the dataframe columns and plot them against each other in a scatter graph.

I've looked through 20+ questions asking similar things and can't figure it out - would appreciate some help on where to start. Can I use lapply and plot or ggplot when they're two DFs? Should I merge them first?

Upvotes: 0

Views: 906

Answers (3)

MartijnVanAttekum
MartijnVanAttekum

Reputation: 1445

As you suggest, I would indeed first rearrange into a list of plottable data frames before calling the plot command. I think that would especially be the way to go if you want to feed the data argument into ggplot. Something like:

plot_dfs <- lapply(names(df1),function(nm)data.frame(col1 = df1[,nm], col2 = df2[,nm]))
for (df in plot_dfs)plot(x = df[,"col1"], y = df[,"col2"])

or using ggplot:

for (df in plot_dfs){
  print(
  ggplot(data = df, aes(x=col1, y=col2)) +
  geom_point())}

and if you want to add the column names as plot titles, you can do:

for (idx in seq_along(plot_dfs)){
  print(
    ggplot(data = plot_dfs[[idx]], aes(x=col1, y=col2)) +
      ggtitle(names(df1)[idx]) +
      geom_point())}

Upvotes: 1

lefft
lefft

Reputation: 2105

Here’s one way to do it — loop over the column indices and create the plots one by one, adding them to a list and writing each one to a file:

library(ggplot2)

# create some data to plot 
df1 <- iris[, sapply(iris, is.numeric)]
df2 <- iris[sample(1:nrow(iris)), sapply(iris, is.numeric)]

# a list to catch each plot object 
plot_list <- vector(mode="list", length=ncol(df1))

for (idx in seq_along(df1)){

  plot_list[[idx]] <- ggplot2::qplot(df1[[idx]], df2[[idx]]) + 
    labs(title=names(df1)[idx])

  ggsave(filename=paste0(names(df1)[idx], ".pdf"), plot=plot_list[[idx]])
}

As you suggest in the question, you can also use s/lapply() with an anonymous function, e.g. like this (though here we're not storing the plots, just writing each one to disk):

lapply(seq_along(df1), function(idx){
  the_plot <- ggplot2::qplot(df1[[id]], df2[[idx]]) + labs(title=names(df1)[idx])
  ggsave(filename=paste0(names(df1)[idx], ".pdf"), plot=the_plot)
})

If you want to keep the list of plots (as in the for-loop example), just assign the lapply() to a variable (e.g. plot_list) and add line like return(the_plot) before closing the function.

There's tons of ways you could modify/adapt this approach, depending on what your objectives are.

Hope this helps ~~

p.s. if it's possible the columns won't be in the same order, it is better to loop over column names instead of column indices (i.e. use for (colname in names(df1)){... instead of for (idx in seq_along(df1)){...). You can use the same [[ subsetting syntax with both names and indices.

Upvotes: 0

KamRa
KamRa

Reputation: 349

You can loop through the columns like this:

for(col in 1:ncol(df1)){
  plot(df1[,col], df2[,col])
}

Make sure that both data frames have the same number of columns (and the order of the columns are the same) before running this.

Upvotes: 1

Related Questions