Reputation: 53
I have a set of data which I've re-sampled, is there a command that I can use in R to smooth the data first, and only then create the graph from the created data frame?.
My data has a lot of noise, an after I've re-sampled the data, now I want to smooth out the data, I used the geom_smooth
to produce a graphic of the data, but the command only creates the graphical representation of the smoothed out data, without giving out the values of the points it represented.
library(ggplot2)
library(dplyr)
library(plotly)
df <- read.csv("data.csv", header = T)
str(df)
rs <- sample_n(df,715)
q <-
ggplot(df,aes(x,y)) +
geom_line() +
geom_smooth(method = "loess", formula = y~log(x), span = 0.05)
This is what I used to smooth out my data, I used loess, formula = y~log(x), span = 0.05
because out of all the smoothing out method I've tried, this is the closest result to what I want which is smoothing with the least errors or differences from the original data. I apologize for not giving a reproducible example, I am not far enough into learning R that I can create a random data, any help is appreciated, thanks in advance.
Upvotes: 1
Views: 1162
Reputation: 3649
This answer is based on the data at imgur.com/a/L22T4BN
library(tidyverse)
# I've reproduced a subset of your data
df <- data.frame(Date = c('21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019'),
Time24 = c('15:45:22',
'15:18:11',
'15:22:10',
'15:18:38',
'15:40:50',
'15:51:42',
'15:38:29',
'15:20:20',
'15:41:34'
),
MPM25 = c(46, 34, 57, 51, 31, 32,46,33,31))
glimpse(df)
# Variables: 4
# $ Date <fct> 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019
# $ Time24 <fct> 15:18:11, 15:18:38, 15:22:10, 15:40:50, 15:45:22, 15:51:42
# $ MPM25 <dbl> 34, 51, 57, 31, 46, 32
# $ datetime <dttm> 2019-05-21 15:18:11, 2019-05-21 15:18:38, 2019-05-21 15:22:10, 2019-0
# Note that the Date and Time24 are factors <fct>
# We can use these values to create a datetime object
# Also note the dates/time are out of order because they some from a random sample
df <- df %>%
mutate(datetime = str_c(as.character(Date), as.character(Time24), sep = ' ')) %>% # join date and time
mutate(datetime = lubridate::dmy_hms(datetime)) %>% # convert to datetime object
mutate(num_datetime = as.numeric(datetime)) %>% # numerical version of datetime required for loess fitting
arrange(datetime) # put times in order
# Take care with the time zone. The function dmy_hms defaults to UTC.
# You may need to use the timezone for your area e.g. for me it would be tz = 'australia/melbourne'
# we can then plot
df %>%
ggplot(aes(x = datetime, y = MPM25)) +
geom_point() +
geom_smooth(span = 0.9) # loess smooth
# fitting a loess
m_loess <- loess(MPM25 ~ num_datetime, data = df, span = 0.9) #fit a loess model
# Create predictions
date_seq <- seq(from = 1558451891, # 100 points from the first to the late datetime
to = 1558453902,
length.out = 100)
m_loess_pred <- predict(m_loess,
newdata = data.frame(num_datetime = date_seq))
# To plot the dates they need to be in POSIXct format
date_seq <- as.POSIXct(date_seq, tz = 'UTC', origin = "1970-01-01")
# Create a dataframe with times and predictions
df_predict <- data.frame(datetime = date_seq, pred = m_loess_pred)
# Plot to show that geom_smooth and the loess predictions are the same
df %>%
ggplot(aes(x = datetime, y = MPM25)) +
geom_point() +
geom_smooth(span = 0.9, se = FALSE) +
geom_point(data = df_predict, aes(x = datetime, y = pred) , colour = 'orange')
Upvotes: 0
Reputation: 3649
You can fit a loess model to the data and then use predict to determine the points to plot.
library(tidyverse)
# Generate some noisy data
x <- seq(1,100)
y <- x + rnorm(100, sd = 20)
df <- tibble(x = x, y = y)
# plot with a smooth
df %>%
ggplot(aes(x,y)) +
geom_point() +
geom_smooth(method = "loess")
# Alteratively
m_loess <- loess(y ~ x, df) #fit a loess model
m_loess_pred <- predict(m_loess) # predict for each data point
df <- df %>% # add predictions to data frame for plotting
add_column(m_loess_pred)
df %>% # plot
ggplot(aes(x,m_loess_pred)) +
geom_point()
Upvotes: 3