Merik
Merik

Reputation: 2817

stat_smooth not displayed on the plot

Given the data set shown below, I ran this command to draw a line graph with an overlaid smooth curve:

ggplot(tmp,
       aes(CalendarMonth, Score)) +
  geom_line(stat='identity', group = 1) + ylim(0, 3) +
  theme_few() + ylab('Average score in the month') +
  theme(axis.text.x = element_text(angle=90)) +
  stat_smooth(aes(CalendarMonth, Score), method='loess')

But this draws only a line graph, i.e. whether I run the stat_smooth part or leave it out, the output does not change, and the smooth curve is not overlaid. What am I missing here?

Data:

tmp <- data.frame(
  CalendarMonth = c('2012-07', '2012-08', '2012-06', '2012-05', '2012-04', '2012-09',
  '2012-10', '2012-11', '2012-12', '2013-01', '2013-02', '2013-03', '2013-04', '2013-05',
  '2013-06', '2013-07', '2013-08', '2013-09', '2013-10', '2013-11', '2013-12', '2014-01',
  '2014-02', '2014-03', '2014-04', '2014-05', '2014-06', '2014-07', '2014-08', '2014-09',
  '2014-10', '2014-11', '2014-12', '2015-01', '2015-02', '2015-03', '2015-04', '2015-05',
  '2015-06', '2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12', '2016-01',
  '2016-02', '2016-03', '2016-04', '2016-05', '2016-06', '2016-07', '2016-08', '2016-09',
  '2016-10', '2016-11', '2016-12', '2017-01', '2017-02', '2017-03', '2017-04', '2017-05',
  '2017-06', '2017-07', '2017-08', '2017-09'),
  Score = c(2.716667, 2.577465, 2.615385, 3.000000, 3.000000, 2.446429,
  2.426667, 2.683544, 2.526316, 2.568966, 2.506849, 2.537500, 2.578125,
  2.470588, 2.741935, 2.560261, 2.479195, 2.545605, 2.577778, 2.539216,
  2.556492, 2.535593, 2.567829, 2.557214, 2.587662, 2.580189, 2.512069,
  2.572402, 2.582792, 2.555938, 2.512586, 2.561224, 2.572308, 2.557940,
  2.540000, 2.593333, 2.513274, 2.566952, 2.548649, 2.623223, 2.565079,
  2.537344, 2.516667, 2.509485, 2.519084, 2.544262, 2.612795, 2.496429,
  2.467128, 2.596226, 2.560714, 2.563253, 2.588462, 2.569395, 2.668919,
  2.581197, 2.543253, 2.524648, 2.594796, 2.551613, 2.583333, 2.474074,
  2.627306, 2.505017, 2.561086, 2.554545)
)

Upvotes: 2

Views: 1379

Answers (1)

Kamil
Kamil

Reputation: 412

Your data type is important, and as @joran mentioned in the comment, your data will need to change type before you can have a proper display.

We can quickly troubleshoot your issue with str:

> str(tmp)
'data.frame':   66 obs. of  2 variables:
 $ CalendarMonth: Factor w/ 66 levels "2012-04","2012-05",..: 4 5 3 2 1 6 7 8 9 10 ...
 $ Score        : num  2.72 2.58 2.62 3 3 ...

In general, when you create a data frame you will want to set the parameter stringsAsFactors to false. If you do, you'll need to first run as.factor before as.integer. Have a look at what as.integer does to your factored data.

> as.integer(as.character(tmp$CalendarMonth))
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> as.integer(as.factor(as.character(tmp$CalendarMonth)))
 [1]  4  5  3  2  1  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

You'll notice that because the data format is in a YYYY-MM pseudo-date format when converting from a factor to an integer the values become an ordered list of date ranges. In general, be careful when you notice these types of conversions in R. The as.integer is following an interesting format, where the character values are compared to determine the order. What might work in one format may not work on another. For example:

> df <- data.frame(month = c('jan', 'feb', 'mar', 'dec', 'apr'))
> str(df)
'data.frame':   5 obs. of  1 variable:
 $ month: Factor w/ 5 levels "apr","dec","feb",..: 4 3 5 2 1
> as.integer(df$month)
[1] 4 3 5 2 1

Be sure you understand how the solution works, so as to avoid a potential headache in the future. With that being said:

> tmp$cm <- as.integer(tmp$CalendarMonth)
> ggplot(tmp,
+        aes(CalendarMonth, Score)) +
+     geom_line(stat='identity', group = 1) + ylim(0, 3) +
+     theme_few() + ylab('Average score in the month') +
+     theme(axis.text.x = element_text(angle=90)) +
+     stat_smooth(aes(cm, Score), method='loess')

Gets you the right graph:

enter image description here

Upvotes: 2

Related Questions