user1157485
user1157485

Reputation:

ggplot2 secondary axis strange output

I am trying two make a double y-axis plot with ggplot2. However, the primary y-axis text values are changed (and limits) and one of the variables is wrong displayed ("mean" variable). Edit: The text labels for the "mean" variable are ranging from 0.55 until 0.75, making difficult to see the varibility. However, in the original step for that plot (p <- p + geom_line(aes(y = mean_d, colour = "mean")) + geom_point(aes(y = mean_d, colour = "mean"))) it was ranging from 0.7757 until 0.7744. It should be displayed as the original step (maybe it has to be with the manipulation of the data within the ggplot calls?) In addition, is it possible to coordinate the axis-y1 texts with the axis-y2 text to be displayed in the same horizontal line?

# dput(coeff.mean)
coeff.mean <- structure(list(individuals = c(5L, 18L, 31L, 43L, 56L, 69L, 82L, 
95L, 108L, 120L, 133L, 146L, 159L, 172L, 185L, 197L, 210L, 223L, 
236L, 249L, 262L, 274L, 287L, 300L, 313L, 326L, 339L, 351L, 364L, 
377L), mean_d = c(0.775414405190575, 0.774478867355839, 0.774632679560057, 
0.774612015422181, 0.774440717600404, 0.774503749029999, 0.774543337328481, 
0.774536584528457, 0.774518615875444, 0.774572944896752, 0.774553554507719, 
0.774526346948343, 0.774537645238366, 0.774549039219398, 0.774518593880137, 
0.77452848368359, 0.774502654364311, 0.774527249259969, 0.774551190425812, 
0.774524221826879, 0.774514765537317, 0.774541221078135, 0.774552621147008, 
0.774546365564095, 0.774540310535789, 0.774540468208943, 0.774548658706833, 
0.77454534219406, 0.774541081476004, 0.774541996470423), var_d = c(0.000438374265308954, 
0.000345714068446388, 0.000324909665783972, 0.000318897997146887, 
0.000316077108040133, 0.000314032075708385, 0.000310447758209298, 
0.000310325171003455, 0.000311927176741998, 0.000309622062319051, 
0.000308772480851544, 0.000308388263293765, 0.000306838067001956, 
0.000307838047303517, 0.000307737478217495, 0.000306351076037266, 
0.000307288393036824, 0.000306717640522594, 0.000306768886331324, 
0.000306897320278579, 0.000307154374510682, 0.000306352361061403, 
0.000306998606721366, 0.000306434828650204, 0.000305865398401208, 
0.000306061994682725, 0.000305934443005304, 0.000305853730364841, 
0.000306181262913308, 0.000306820996289535)), .Names = c("individuals", 
"mean_d", "var_d"), row.names = c(NA, -30L), class = c("tbl_df", 
"tbl", "data.frame"))

p <- ggplot(coeff.mean, aes(x=individuals))
p <- p + geom_line(aes(y = mean_d, colour = "mean")) + geom_point(aes(y = mean_d, colour = "mean"))
p <- p + geom_line(aes(y = var_d*(max(mean_d)/max(var_d)), colour = "var")) + geom_point(aes(y = var_d*(max(mean_d)/max(var_d)), colour = "var")) 
p <- p + scale_y_continuous(sec.axis = sec_axis(~.*(max(coeff.mean$var_d)/max(coeff.mean$mean_d)), name = "var"))
p <- p + scale_colour_manual(values = c("black", "grey"))
p <- p + labs(y = "mean", x = "Resampled", colour = "Statistic")
print(p)

I do appreciate any advice.

enter image description here

Upvotes: 1

Views: 2307

Answers (2)

Brian
Brian

Reputation: 8305

This more clearly shows what my comment was pointing out: You don't need to multiplicatively scale var_d, you need to add to it.

library(dplyr)

coeff.mean %>% 
  ggplot(aes(individuals, mean_d)) +
  geom_point(aes(color = "mean_d")) + geom_line(aes(color = "mean_d")) +
  geom_point(aes(individuals, var_d+0.7745, color = "var_d")) + 
  geom_line(aes(individuals, var_d+0.7745, color = "var_d")) +
  scale_y_continuous(sec.axis = sec_axis(trans = ~ . - 0.7745))

enter image description here

Of course, this figure is problematic for all sorts of reasons. It's hard to interpret for sure.

If you want to scale both multiplicatively and additively, you could try scales::rescale, once to scale var_d to the range of mean_d, and then again to scale the scaled var_d back to the original range.

coeff.mean %>% 
  mutate(var_rescaled = scales::rescale(var_d, to = range(mean_d))) %>% 
  ggplot(aes(individuals, mean_d)) +
  geom_point(aes(color = "mean_d")) + geom_line(aes(color = "mean_d")) +
  geom_point(aes(y = var_rescaled, color = "var_d")) + 
  geom_line(aes(y = var_rescaled, color = "var_d")) +
  scale_y_continuous(sec.axis = 
    sec_axis(trans = ~scales::rescale(., to = range(coeff.mean$var_d)),
             breaks = function(values) {scales::pretty_breaks(n=5)(values)},
             name = "var_d"))

enter image description here

This one has problems too. Particularly, since the highest value of both mean_d and var_d were at the same individual, they overlap at that point.

Upvotes: 1

bdemarest
bdemarest

Reputation: 14665

Here I show using facets as an alternative to a dual-axis plot. I know it does not answer the original question, sorry!

library(ggplot2)
library(tidyr)

# Convert data to long form with tidyr::gather()
long_dat = gather(data=coeff.mean, key="stat", value="stat_value", mean_d, var_d)

head(long_dat)
# A tibble: 6 x 3
#   individuals   stat stat_value
#         <int>  <chr>      <dbl>
# 1           5 mean_d  0.7754144
# 2          18 mean_d  0.7744789
# 3          31 mean_d  0.7746327
# 4          43 mean_d  0.7746120
# 5          56 mean_d  0.7744407
# 6          69 mean_d  0.7745037

p2 = ggplot(long_dat, aes(x=individuals, y=stat_value, colour=stat)) + 
     geom_point() + 
     geom_line() + 
     scale_colour_manual(values=c(mean_d="black", var_d="grey40")) +
     facet_grid(stat ~ ., scales="free_y")

ggsave("faceted_plot.png", plot=p2, height=4, width=6, dpi=150)

enter image description here

Upvotes: 2

Related Questions