Reputation: 143
I have two dataframes I'd like to plot against each other:
> df1 <- data.frame(HV = c(3,3,3), NAtlantic850t = c(0.501, 1.373, 1.88), AO = c(-0.0512, 0.2892, 0.0664))
> df2 <- data.frame(HV = c(3,3,2), NAtlantic850t = c(1.2384, 1.3637, -0.0332), AO = c(-0.5915, -0.0596, -0.8842))
They're identical, I'd like to plot them column vs column (e.g. df1$HV, df2$HV) - loop through the dataframe columns and plot them against each other in a scatter graph.
I've looked through 20+ questions asking similar things and can't figure it out - would appreciate some help on where to start. Can I use lapply and plot or ggplot when they're two DFs? Should I merge them first?
Upvotes: 0
Views: 906
Reputation: 1445
As you suggest, I would indeed first rearrange into a list of plottable data frames before calling the plot command. I think that would especially be the way to go if you want to feed the data
argument into ggplot
. Something like:
plot_dfs <- lapply(names(df1),function(nm)data.frame(col1 = df1[,nm], col2 = df2[,nm]))
for (df in plot_dfs)plot(x = df[,"col1"], y = df[,"col2"])
or using ggplot:
for (df in plot_dfs){
print(
ggplot(data = df, aes(x=col1, y=col2)) +
geom_point())}
and if you want to add the column names as plot titles, you can do:
for (idx in seq_along(plot_dfs)){
print(
ggplot(data = plot_dfs[[idx]], aes(x=col1, y=col2)) +
ggtitle(names(df1)[idx]) +
geom_point())}
Upvotes: 1
Reputation: 2105
Here’s one way to do it — loop over the column indices and create the plots one by one, adding them to a list and writing each one to a file:
library(ggplot2)
# create some data to plot
df1 <- iris[, sapply(iris, is.numeric)]
df2 <- iris[sample(1:nrow(iris)), sapply(iris, is.numeric)]
# a list to catch each plot object
plot_list <- vector(mode="list", length=ncol(df1))
for (idx in seq_along(df1)){
plot_list[[idx]] <- ggplot2::qplot(df1[[idx]], df2[[idx]]) +
labs(title=names(df1)[idx])
ggsave(filename=paste0(names(df1)[idx], ".pdf"), plot=plot_list[[idx]])
}
As you suggest in the question, you can also use s/lapply()
with an anonymous function, e.g. like this (though here we're not storing the plots, just writing each one to disk):
lapply(seq_along(df1), function(idx){
the_plot <- ggplot2::qplot(df1[[id]], df2[[idx]]) + labs(title=names(df1)[idx])
ggsave(filename=paste0(names(df1)[idx], ".pdf"), plot=the_plot)
})
If you want to keep the list of plots (as in the for
-loop example), just assign the lapply()
to a variable (e.g. plot_list
) and add line like return(the_plot)
before closing the function.
There's tons of ways you could modify/adapt this approach, depending on what your objectives are.
Hope this helps ~~
p.s. if it's possible the columns won't be in the same order, it is better to loop over column names instead of column indices (i.e. use for (colname in names(df1)){...
instead of for (idx in seq_along(df1)){...
). You can use the same [[
subsetting syntax with both names and indices.
Upvotes: 0
Reputation: 349
You can loop through the columns like this:
for(col in 1:ncol(df1)){
plot(df1[,col], df2[,col])
}
Make sure that both data frames have the same number of columns (and the order of the columns are the same) before running this.
Upvotes: 1