Creating scatter plots in a loop with R, and adding regression lines to the plot

Question

I am creating a scatter plot based on each unique value in the first column of my data file. The scatter plots are being created fine but I would like to add a regression line to each of these graphs. With my current approach I am receiving a regression line on only one of the graphs(coat.pdf). This regression line is also just y=1 and does not follow the data. I would like a regression line on each graph that fits the data. I would like to be able to use R in a more object oriented fashion such as "plot.addregression" because with these loose functions be applied I feel like I do not quite know what it is accessing.

rates = read.csv("file.txt")
for(i in unique(rates[,1])){
        dev.new()
        freq = unlist(rates[2])
        temp = unlist(rates[3])
        fMatch = freq[rates[1] == toString(i)]
        tMatch = temp[rates[1] == toString(i)]
        plot(fMatch,tMatch)#,xlab="freq",ylab="temp")
        abline(lm(fMatch~tMatch), col="red")
        file.rename("Rplots.pdf", paste(i,".pdf",sep=""))
        dev.off()
}

file.txt

clothing,freq,temp
coat,0.3,10
coat,0.9,0
coat,0.1,20
hat,0.5,20
hat,0.3,15
hat,0.1,5
scarf,0.4,30
scarf,0.2,20
scarf,0.1,10

user2554330 · Accepted Answer

There are a few problems with your code.

dev.new() opens a device, but it's not necessarily a pdf device: it's better to open that device explicitly if that's what you want.
unlist(rates[2]) will probably work in this case, but is not the usual way to extract a column from a dataframe. rates[[2]] or rates[,2] are more usual. But extracting columns isn't even necessary: use the data argument to plot and lm instead.
Your subsetting is wrong. I'm not exactly sure what it would do, but almost certainly not what you want. It would be better to subset the dataframe you use in the data argument.
plot has two common forms: plot(x, y) or plot(y ~ x). You appear to have used plot(y, x) which will swap the axes. Only the plot(y ~ x) form works with the data argument, so I'd use that. It's also consistent with lm(), another advantage.
paste0(...) is a convenient shorthand for paste(..., sep="").

So here's a translation that probably does what you want:

rates = read.csv("file.txt")
for(i in unique(rates[,1])){
  pdf(file = paste0(i, ".pdf"))
  match <- rates[rates[,1] == i, ]
  plot(freq ~ temp, data = match)#,xlab="freq",ylab="temp")
  abline(lm(freq ~ temp, data = match), col="red")
  dev.off()
}

Creating scatter plots in a loop with R, and adding regression lines to the plot

Answers (1)

Related Questions