rjblake34
rjblake34

Reputation: 25

Why is geom_smooth not plotting? (insufficient unique values error)

I'm trying to compare historical daily attendance figures between the Mariners and White Sox.

I created my data frame with MySQL database and whittled it down to these columns: date, hometeam, dayofweek, and attendance.

I then used lubridate to convert the number that encodes the date into a Date field in R. I also set the attendance of games reporting 0 to NA. I did both with:

sea_attendance <- sea_attendance %>%
  mutate(the_date = ymd(date),
         attendance = ifelse(attendance == 0, NA, attendance))

I tried to plot it with this:

ggplot(sea_attendance,
       aes(x = wday(the_date), y = attendance,
           color = hometeam)) +
  geom_jitter(height = 0, width = 0.2, alpha = 0.2) +
  geom_smooth() +
  scale_y_continuous("Attendance") +
  scale_x_continuous("Day of the Week", breaks = 1:7,
                    labels = wday(1:7, label = TRUE)) +
  scale_color_manual(values = c("blue", "grey"))

It came out pretty cool, but I couldn't get geom_smooth to work:

jittered plot of attendance by week

I got this error:

`geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Warning messages:
1: Removed 44 rows containing non-finite values (stat_smooth). 
2: Computation failed in `stat_smooth()`:
x has insufficient unique values to support 10 knots: reduce k. 
3: Removed 44 rows containing missing values (geom_point). 

This is a problem out of a textbook. I've been staring at it for an hour trying to figure out where I've gone wrong.

Upvotes: 2

Views: 3852

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226182

You probably need something like

geom_smooth(method="gam", formula = y ~ s(x, bs = "cs", k=5))

ggplot2 (calling the mgcv package) is trying to compute a smooth curve through 7 unique x-values (before jittering), and the default number of "knots" (spline breakpoints) is set to 10.

You could also use an alternate geom_smooth() method (e.g. method="loess" or method="lm" (although the latter will give you a linear fit; you could make it polynomial with e.g. formula = y ~ poly(x,3)), or use stat_summary(fun.y=mean, geom="line") to connect the means of the groups with a line ...

Related posts (useful, but not necessarily answered clearly):

Upvotes: 7

Related Questions