James Kyle
James Kyle

Reputation: 464

geom_smooth coloring for two values per row

I have a data set that has two values per row I'd like to plot against each other.

For example:

RHC,1,0.370,0.287,0.003,0.063
SA,1,0.352,0.258,0.003,0.057
GA,1,0.121,0.091,0.430,0.008

I want to plot an individual line per column, grouped by the first column. E.g. for the RHC row, I'm plotting {x,y1} and {x,y2} of {1,0.370} and {1,0.287} respectively.

The following ggplot/geom_smooth accomplishes this:

ggplot(data=d) + 
  geom_smooth(aes(x=iterations, y=training.error, col=algorithm)) + 
  geom_smooth(aes(x=iterations, y=testing.error, col=algorithm))

However, both lines end up with a single legend entry and a single color...making them impossible to differentiate.

How can I apply a different color and respective legend entry for each line produced by each geom_smooth call?

To reproduce:

library(ggplot2)
d <- read.csv("https://gist.githubusercontent.com/jameskyle/8d233dcbd0ad0b66bfdd/raw/9c975ac9d9bbcb633e44cfd70b66f7ab89dc1517/results.csv")

p1 <- ggplot(data=d) +
    geom_smooth(aes(x=iterations, y=training.error, col=algorithm)) +
    geom_smooth(aes(x=iterations, y=testing.error, col=algorithm))

pdf("graph.pdf")
print(p1)
dev.off()

The above code will produce:

ggplot graph

Upvotes: 1

Views: 1541

Answers (1)

Jaap
Jaap

Reputation: 83215

Because you have several lines quite close to each other in one plot, it is probably better to use facets to get a clearer plot. Therefore the data should be reshaped into long format.

With the data.table package you can reshape into long format with multiple columns simultaneously:

library(data.table)

# melting operation for the error & time columns simultaneously
# and setting the appropriate labels for the variable column 
d1 <- melt(setDT(d),
           measure.vars = patterns('.error','.time'),
           value.name = c('error','time'))[, variable := c('train','test')[variable]]

Now you can make the facetted plot (I've added a fill as well for differentiating between the shaded areas):

ggplot(data=d1) +
  geom_smooth(aes(x=iterations, y=error, col=variable, fill=variable), size=1) +
  facet_grid(. ~ algorithm) +
  theme_bw()

this results in:

enter image description here

If you really want everything in one plot, you can add a linetype to the aes as well in order to better differentiate between the several lines:

ggplot(data=d1) +
  geom_smooth(aes(x=iterations, y=error, col=algorithm, linetype=variable), size=1) +
  theme_bw()

the result:

enter image description here

Upvotes: 4

Related Questions