deeenes
deeenes

Reputation: 4576

ggplot2: fit geom_smooth() like categorical variable were continuous

I have a continuous variable on y, and a categorical on x axis. At the categorical variable the order makes sense, and it would make sense to fit a regression by its index, I mean instead of c('a', 'b', 'c') use the indices (order(c('a', 'b', 'c')), which is c(1, 2, 3)), and fit the model against this. However, ggplot rejects to fit a geom_smooth(method = lm) if one variable is not numeric. Ok, then I can tell it that use the order:

geom_smooth(aes(x = order(hgcc), y = rtmean), method = lm)

But then it takes the indices of the whole column from the data frame, which is not good with faceting with scales = 'free', when only a subset of the levels of the x variable appears on one plot. The indexes in the whole dataframe are much higher in average, so the regression will be plotted far on the right:

regression pushed to right

Here is a minimal working example:

require(ggplot2)
load(url('http://www.ebi.ac.uk/~denes/54b510889336eb2591d8beff/sample_data.RData'))

ggplot(adata12cc, aes(x = hgcc, y = rtmean, color = cls, size = log10(intensity))) +
geom_point(stat = 'sum', alpha = 0.33) +
geom_smooth(
    aes(x = order(hgcc), y = rtmean),
    method = 'glm') +
facet_wrap( ~ uhgroup, scales = 'free') +
scale_radius(guide = guide_legend(title = 'Intensity (log)')) +
scale_color_discrete(guide = guide_legend(title = 'Class')) +
xlab('Carbon count unsaturation') +
ylab('Mean RT [min]') +
ggtitle('RT vs. carbon count & unsaturation by headgroup') +
theme(axis.title = element_text(size = 24),
    axis.text.x = element_text(angle = 90, vjust = 0.5, size = 9, hjust = 1),
    axis.text.y = element_text(size = 11),
    plot.title = element_text(size = 21),
    strip.text = element_text(size = 18),
    panel.grid.minor.x = element_blank())

I know this is not the nice way of doing things, but ggplot could make life so much easier, if I could refer to those variables and do something with them which are subsetted anyways by faceting.

Upvotes: 2

Views: 7243

Answers (1)

timat
timat

Reputation: 1500

I think I've got a solution, but I'm not sure what you want...

The Main problem is that your x value label, is already split by uhgroup If you look at the factor they are PC-O(38.7) PC(38.7 etc...

So the first thing is too create a new hgcc value for the x axis.

adata12cc$hgcc_value <-as.factor(substr(adata12cc$hgcc, (nchar(levels(adata12cc$hgcc)[adata12cc$hgcc])-5), nchar(levels(adata12cc$hgcc)[adata12cc$hgcc])))

Then another problem is that you have different x axis for geom_point and geom_smooth. One is hgcc, the other is order(hgcc_value).

The solution is to use the same value, here I use as.numeric(hgcc_value) (instead of order()) and to precise in scale_x_continuous the label of the breaks.

ggplot(adata12cc, aes(x = as.numeric(hgcc_value), y = rtmean, color = cls, size = log10(intensity))) +
  geom_point(stat = 'sum', alpha = 0.33) +
  geom_smooth(
    aes(x = as.numeric(hgcc_value), y = rtmean),
    method = 'glm') +
  facet_wrap( ~ uhgroup, scales = 'free') +
  scale_radius(guide = guide_legend(title = 'Intensity (log)')) +
  scale_color_discrete(guide = guide_legend(title = 'Class')) +
  scale_x_continuous(name = "Carbon count unsaturation",
                     breaks=as.numeric(adata12cc$hgcc_value),
                     labels = adata12cc$hgcc_value,
                     minor_breaks = NULL)+
  ylab('Mean RT [min]') +
  ggtitle('RT vs. carbon count & unsaturation by headgroup') +
  theme(axis.title = element_text(size = 24),
        axis.text.x = element_text(angle = 90, vjust = 0.5, size = 9, hjust = 1),
        axis.text.y = element_text(size = 11),
        plot.title = element_text(size = 21),
        strip.text = element_text(size = 18),
        panel.grid.minor.x = element_blank())

enter image description here

Is it what you were looking for?

Upvotes: 5

Related Questions