Wesley Lozano
Wesley Lozano

Reputation: 65

GGPLOT: How can I plot loess curves for specified subsets of my data points?

(I'm self-taught in R and use this forum often, but this is my first post. Feedback is appreciated.)

This should have a relatively simple solution, but I can't find it and it's making me want to throw my computer out the window. On to the point, I have a simple data set:

mydata <- structure(list(Date = c("2020-06-22", "2020-06-22", "2020-06-23", 
"2020-06-23", "2020-06-24", "2020-06-24", "2020-06-25", "2020-06-25", 
"2020-06-26", "2020-06-26", "2020-06-29", "2020-06-29", "2020-06-30", 
"2020-06-30", "2020-07-01", "2020-07-01", "2020-07-02", "2020-07-02", 
"2020-07-06", "2020-07-06", "2020-07-06", "2020-07-06", "2020-07-07", 
"2020-07-07", "2020-07-08", "2020-07-08", "2020-07-08", "2020-07-09", 
"2020-07-09", "2020-07-09"), Location = c("Haskell", "Bustamante", 
"Haskell", "Bustamante", "Haskell", "Bustamante", "Bustamante", 
"Haskell", "Bustamante", "Haskell", "Bustamante", "Haskell", 
"Bustamante", "Haskell", "Bustamante", "Haskell", "Bustamante", 
"Haskell", "Bustamante", "Haskell", "Bustamante", "Haskell", 
"Bustamante", "Haskell", "Bustamante", "Haskell", "Tap Water", 
"Bustamante", "Haskell", "Tap Water"), UVT = c(72.2, 65.6, 70, 
61.8, 71.5, 63.9, 63.9, 71.5, 68.1, 71.5, 68.9, 71.3, 71.3, 72.4, 
68.9, 67.3, 49.4, 49, 39.3, 42.3, 64.2, 70.9, 33.3, 49.3, 46, 
48.8, 88.7, 66, 70.5, 84.7), Source = c("Shawn", "Shawn", "Jesus", 
"Jesus", "Jesus", "Jesus", "Jesus", "Jesus", "Jesus", "Jesus", 
"Jesus", "Jesus", "Jesus", "Jesus", "Jesus", "Jesus", "Jesus", 
"Jesus", "Jesus", "Jesus", "Shawn", "Shawn", "Jesus", "Jesus", 
"Jesus", "Jesus", "Jesus", "Jesus", "Jesus", "Jesus")), row.names = c(NA, 
-30L), class = "data.frame")

First, I tried plotting the data grouping by location, but I'm guessing since the "Tap Water" group only has 2 data points, it doesn't meet the degrees requirement:

#Import Packages
library(tidyverse)

#Import Data
mydata <- read.csv("L:\\2019\\19W06195 - EPW HRS and RRB WWTPs Disinfection Study\\Design\\Design Criteria\\R\\UVT Graphs\\UVTdata.csv")

#Plot
p <- ggplot(data=mydata, aes(x=as.Date(mydata[,1], "%Y-%m-%d"), y=mydata[,3], color=mydata[,2])) + geom_point() + geom_smooth(method = "loess", se = FALSE)

p + scale_x_date(date_breaks = "days" , date_labels = "%b-%d")

Plot attempt #1

This is the error i recieved:

Warning messages:
1: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  span too small.   fewer data values than degrees of freedom.
2: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  at  18451
3: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  radius  2.5e-005
4: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  all data on boundary of neighborhood. make span bigger
5: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  pseudoinverse used at 18451
6: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  neighborhood radius 0.005
7: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  reciprocal condition number  1
8: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  at  18452
9: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  radius  2.5e-005
10: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  all data on boundary of neighborhood. make span bigger
11: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  There are other near singularities as well. 2.5e-005
12: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  zero-width neighborhood. make span bigger
13: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  ... :
  zero-width neighborhood. make span bigger
14: Computation failed in `stat_smooth()`:
NA/NaN/Inf in foreign function call (arg 5)

Note that running this same code, but specifying "method=lm" rater than "method=loess" works perfectly, but doesn't show the trend that I want.

linear regression model

To fix this, I tried setting a condition to default to a linear regression for data subsets with too few data points:

sProduct <- unique(mydata[,2])
p <- ggplot(mydata, aes(as.Date(mydata[,1], "%Y-%m-%d"), mydata[,3], color = mydata[,2])) + geom_point()

for (i in sProduct){

  sMethod <- ifelse(sum(mydata[,2] == i) <= 5, "lm", "loess")
  p <- p + geom_smooth(data = subset(mydata, mydata[,2] == i), method = sMethod, se = FALSE)
}

p

Despite this effort, I now get an aesthetic error:

Error: Aesthetics must be either length 1 or the same as the data (14): x, y and colour
Run `rlang::last_error()` to see where the error occurred.

I assume this is due to inconsistency in the number of data points between the geom_points and the subsets of data in geom_smooth, but I'm not certain. I also tried setting subsets of data to exclude the "Tap Water" from the geom_smooth, as I'm not generally interested in the trend there anyway:

p <- ggplot(data=mydata, aes(x=as.Date(mydata[,1], "%Y-%m-%d"), y=mydata[,3], color=mydata[,2])) + geom_point() + geom_smooth(data=subset(mydata, Location=="Bustamante" | Location=="Haskell"), method = "loess", se = FALSE)
p + scale_x_date(date_breaks = "days" , date_labels = "%b-%d")

This yields the same error. Any help here would be greatly appreciated! Thanks!

Upvotes: 2

Views: 2198

Answers (2)

stefan
stefan

Reputation: 124308

Simply map the names of the variabels on the aesthetics instead of putting the columns of the df inside aes().

library(dplyr)
library(ggplot2)

mydata1 <- mydata %>% 
  mutate(Date = as.Date(Date, "%Y-%m-%d")) %>% 
  add_count(Location) %>% 
  mutate(method = ifelse(n <= 5, "lm", "loess"))

p <- ggplot(data=mydata1, aes(x=Date, y=UVT, color=Location)) + 
  geom_point()

p + 
  geom_smooth(data = filter(mydata1, method == "loess"), method = "loess", se = FALSE) +
  geom_smooth(data = filter(mydata1, method == "lm"), method = "lm", se = FALSE)
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'

Upvotes: 2

YBS
YBS

Reputation: 21297

Try the formula='y ~ x' in geom_smooth as

geom_smooth(method = "loess", formula='y ~ x', se = FALSE)

Then you will get the following output (dates not formatted here):

output

Upvotes: 0

Related Questions