Henrik Ode
Henrik Ode

Reputation: 191

GGplot geom_smooth fitting one line accurately and the other as a flat line

I have a dataset where the red line pictured below is based on 2400 data points. The green line is based on 220 data points.

I am running the ggplot as such:

ggplot(data,aes(HB_PRE_MIN,d30, color = factor(preop_transfusion_factor))) +
  geom_smooth()

And it produces this plot

enter image description here

With this output: geom_smooth() using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

If i try to do a geom_smooth where i filter the data to be only the 220 who have preop_transfusion_factor == 1, that is the green curve, i get this curve, which i think is the way the green curve should have looked like if it was correct. Code:

ggplot(data = data %>% filter(preop_transfusion_factor == 1),aes(HB_PRE_MIN,d30)) +
  geom_smooth() + scale_x_continuous(breaks = seq(0,12,by = 1))

enter image description here

Summary:

I would like to be able to plot both of the smoothed curves correctly, whereas right now the one curve with a factor 10 fewer data points simply becomes a flat line. I suspect it has to do with the red curve being based on alot more data points, which somehow might destroy the fitting of the other curve. That is however a hypothesis.

I have tried using "group" keyword instead of color, but the result is the same of the green curve, with factor = 1, becoming a flat line.

If you know of a solution to get both curves correctly fitted and displayed on the same graph i would appreciate hearing it, because by doing so it becomes more obvious wherein they differ from each other.

Kind regards.

Upvotes: 2

Views: 505

Answers (2)

Henrik Ode
Henrik Ode

Reputation: 191

Update I found a solution. It turns out the first curve (the red curve with 2400 data points) automatically chooses method = 'gam' which fits the 2400 data points

the blue curve, that is the one that fits the 220 data points automatically chooses 'loess'.

If i choose 'loess' for both data sets then i get a way better fit for both of them.

Kind regards.

Upvotes: 2

itsMeInMiami
itsMeInMiami

Reputation: 2783

It is not elegant but try moving the two levels into different datasets. Then use two geom_smooth calls:

preop0 <- data %>% filter(preop_transfusion_factor == 0)
preop1 <- data %>% filter(preop_transfusion_factor == 1)


ggplot() +
   geom_smooth(aes(data = preop0, HB_PRE_MIN,d30)) +
   geom_smooth(aes(data = preop1, HB_PRE_MIN,d30))

Upvotes: 1

Related Questions