Reputation: 4576
I have a continuous variable on y, and a categorical on x axis. At the categorical variable the order makes sense, and it would make sense to fit a regression by its index, I mean instead of c('a', 'b', 'c')
use the indices (order(c('a', 'b', 'c'))
, which is c(1, 2, 3)
), and fit the model against this. However, ggplot rejects to fit a geom_smooth(method = lm)
if one variable is not numeric. Ok, then I can tell it that use the order:
geom_smooth(aes(x = order(hgcc), y = rtmean), method = lm)
But then it takes the indices of the whole column from the data frame, which is not good with faceting with scales = 'free'
, when only a subset of the levels of the x
variable appears on one plot. The indexes in the whole dataframe are much higher in average, so the regression will be plotted far on the right:
Here is a minimal working example:
require(ggplot2)
load(url('http://www.ebi.ac.uk/~denes/54b510889336eb2591d8beff/sample_data.RData'))
ggplot(adata12cc, aes(x = hgcc, y = rtmean, color = cls, size = log10(intensity))) +
geom_point(stat = 'sum', alpha = 0.33) +
geom_smooth(
aes(x = order(hgcc), y = rtmean),
method = 'glm') +
facet_wrap( ~ uhgroup, scales = 'free') +
scale_radius(guide = guide_legend(title = 'Intensity (log)')) +
scale_color_discrete(guide = guide_legend(title = 'Class')) +
xlab('Carbon count unsaturation') +
ylab('Mean RT [min]') +
ggtitle('RT vs. carbon count & unsaturation by headgroup') +
theme(axis.title = element_text(size = 24),
axis.text.x = element_text(angle = 90, vjust = 0.5, size = 9, hjust = 1),
axis.text.y = element_text(size = 11),
plot.title = element_text(size = 21),
strip.text = element_text(size = 18),
panel.grid.minor.x = element_blank())
I know this is not the nice way of doing things, but ggplot could make life so much easier, if I could refer to those variables and do something with them which are subsetted anyways by faceting.
Upvotes: 2
Views: 7243
Reputation: 1500
I think I've got a solution, but I'm not sure what you want...
The Main problem is that your x value label, is already split by uhgroup
If you look at the factor they are PC-O(38.7)
PC(38.7
etc...
So the first thing is too create a new hgcc
value for the x axis.
adata12cc$hgcc_value <-as.factor(substr(adata12cc$hgcc, (nchar(levels(adata12cc$hgcc)[adata12cc$hgcc])-5), nchar(levels(adata12cc$hgcc)[adata12cc$hgcc])))
Then another problem is that you have different x axis for geom_point
and geom_smooth
. One is hgcc
, the other is order(hgcc_value)
.
The solution is to use the same value, here I use as.numeric(hgcc_value)
(instead of order()
) and to precise in scale_x_continuous
the label of the breaks.
ggplot(adata12cc, aes(x = as.numeric(hgcc_value), y = rtmean, color = cls, size = log10(intensity))) +
geom_point(stat = 'sum', alpha = 0.33) +
geom_smooth(
aes(x = as.numeric(hgcc_value), y = rtmean),
method = 'glm') +
facet_wrap( ~ uhgroup, scales = 'free') +
scale_radius(guide = guide_legend(title = 'Intensity (log)')) +
scale_color_discrete(guide = guide_legend(title = 'Class')) +
scale_x_continuous(name = "Carbon count unsaturation",
breaks=as.numeric(adata12cc$hgcc_value),
labels = adata12cc$hgcc_value,
minor_breaks = NULL)+
ylab('Mean RT [min]') +
ggtitle('RT vs. carbon count & unsaturation by headgroup') +
theme(axis.title = element_text(size = 24),
axis.text.x = element_text(angle = 90, vjust = 0.5, size = 9, hjust = 1),
axis.text.y = element_text(size = 11),
plot.title = element_text(size = 21),
strip.text = element_text(size = 18),
panel.grid.minor.x = element_blank())
Is it what you were looking for?
Upvotes: 5