Reputation: 2817
Given the data set shown below, I ran this command to draw a line graph with an overlaid smooth curve:
ggplot(tmp,
aes(CalendarMonth, Score)) +
geom_line(stat='identity', group = 1) + ylim(0, 3) +
theme_few() + ylab('Average score in the month') +
theme(axis.text.x = element_text(angle=90)) +
stat_smooth(aes(CalendarMonth, Score), method='loess')
But this draws only a line graph, i.e. whether I run the stat_smooth
part or leave it out, the output does not change, and the smooth curve is not overlaid. What am I missing here?
Data:
tmp <- data.frame(
CalendarMonth = c('2012-07', '2012-08', '2012-06', '2012-05', '2012-04', '2012-09',
'2012-10', '2012-11', '2012-12', '2013-01', '2013-02', '2013-03', '2013-04', '2013-05',
'2013-06', '2013-07', '2013-08', '2013-09', '2013-10', '2013-11', '2013-12', '2014-01',
'2014-02', '2014-03', '2014-04', '2014-05', '2014-06', '2014-07', '2014-08', '2014-09',
'2014-10', '2014-11', '2014-12', '2015-01', '2015-02', '2015-03', '2015-04', '2015-05',
'2015-06', '2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12', '2016-01',
'2016-02', '2016-03', '2016-04', '2016-05', '2016-06', '2016-07', '2016-08', '2016-09',
'2016-10', '2016-11', '2016-12', '2017-01', '2017-02', '2017-03', '2017-04', '2017-05',
'2017-06', '2017-07', '2017-08', '2017-09'),
Score = c(2.716667, 2.577465, 2.615385, 3.000000, 3.000000, 2.446429,
2.426667, 2.683544, 2.526316, 2.568966, 2.506849, 2.537500, 2.578125,
2.470588, 2.741935, 2.560261, 2.479195, 2.545605, 2.577778, 2.539216,
2.556492, 2.535593, 2.567829, 2.557214, 2.587662, 2.580189, 2.512069,
2.572402, 2.582792, 2.555938, 2.512586, 2.561224, 2.572308, 2.557940,
2.540000, 2.593333, 2.513274, 2.566952, 2.548649, 2.623223, 2.565079,
2.537344, 2.516667, 2.509485, 2.519084, 2.544262, 2.612795, 2.496429,
2.467128, 2.596226, 2.560714, 2.563253, 2.588462, 2.569395, 2.668919,
2.581197, 2.543253, 2.524648, 2.594796, 2.551613, 2.583333, 2.474074,
2.627306, 2.505017, 2.561086, 2.554545)
)
Upvotes: 2
Views: 1379
Reputation: 412
Your data type is important, and as @joran mentioned in the comment, your data will need to change type before you can have a proper display.
We can quickly troubleshoot your issue with str
:
> str(tmp)
'data.frame': 66 obs. of 2 variables:
$ CalendarMonth: Factor w/ 66 levels "2012-04","2012-05",..: 4 5 3 2 1 6 7 8 9 10 ...
$ Score : num 2.72 2.58 2.62 3 3 ...
In general, when you create a data frame you will want to set the parameter stringsAsFactors
to false. If you do, you'll need to first run as.factor
before as.integer
. Have a look at what as.integer
does to your factored data.
> as.integer(as.character(tmp$CalendarMonth))
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> as.integer(as.factor(as.character(tmp$CalendarMonth)))
[1] 4 5 3 2 1 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
You'll notice that because the data format is in a YYYY-MM pseudo-date format when converting from a factor to an integer the values become an ordered list of date ranges. In general, be careful when you notice these types of conversions in R. The as.integer
is following an interesting format, where the character values are compared to determine the order. What might work in one format may not work on another. For example:
> df <- data.frame(month = c('jan', 'feb', 'mar', 'dec', 'apr'))
> str(df)
'data.frame': 5 obs. of 1 variable:
$ month: Factor w/ 5 levels "apr","dec","feb",..: 4 3 5 2 1
> as.integer(df$month)
[1] 4 3 5 2 1
Be sure you understand how the solution works, so as to avoid a potential headache in the future. With that being said:
> tmp$cm <- as.integer(tmp$CalendarMonth)
> ggplot(tmp,
+ aes(CalendarMonth, Score)) +
+ geom_line(stat='identity', group = 1) + ylim(0, 3) +
+ theme_few() + ylab('Average score in the month') +
+ theme(axis.text.x = element_text(angle=90)) +
+ stat_smooth(aes(cm, Score), method='loess')
Gets you the right graph:
Upvotes: 2