Hanif Shidki
Hanif Shidki

Reputation: 53

Is there a function to smooth data whilst giving out the values into a new dataframe?

I have a set of data which I've re-sampled, is there a command that I can use in R to smooth the data first, and only then create the graph from the created data frame?.

My data has a lot of noise, an after I've re-sampled the data, now I want to smooth out the data, I used the geom_smooth to produce a graphic of the data, but the command only creates the graphical representation of the smoothed out data, without giving out the values of the points it represented.

library(ggplot2)
library(dplyr)
library(plotly)

df <- read.csv("data.csv", header = T)

str(df)

rs <- sample_n(df,715)

q <- 
  ggplot(df,aes(x,y)) + 
  geom_line() + 
  geom_smooth(method = "loess", formula = y~log(x), span = 0.05)

This is what I used to smooth out my data, I used loess, formula = y~log(x), span = 0.05 because out of all the smoothing out method I've tried, this is the closest result to what I want which is smoothing with the least errors or differences from the original data. I apologize for not giving a reproducible example, I am not far enough into learning R that I can create a random data, any help is appreciated, thanks in advance.

Upvotes: 1

Views: 1162

Answers (2)

Tony Ladson
Tony Ladson

Reputation: 3649

This answer is based on the data at imgur.com/a/L22T4BN

library(tidyverse)

# I've reproduced a subset of your data

df <- data.frame(Date = c('21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019'),
  Time24 = c('15:45:22', 
             '15:18:11', 
             '15:22:10', 
             '15:18:38',
             '15:40:50',
             '15:51:42',
             '15:38:29',
             '15:20:20',
             '15:41:34'
             ),
                 MPM25 = c(46, 34, 57, 51, 31, 32,46,33,31))


glimpse(df)              
# Variables: 4
# $ Date     <fct> 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019
# $ Time24   <fct> 15:18:11, 15:18:38, 15:22:10, 15:40:50, 15:45:22, 15:51:42
# $ MPM25    <dbl> 34, 51, 57, 31, 46, 32
# $ datetime <dttm> 2019-05-21 15:18:11, 2019-05-21 15:18:38, 2019-05-21 15:22:10, 2019-0

# Note that the Date and Time24 are factors <fct>
# We can use these values to create a datetime object
# Also note the dates/time are out of order because they some from a random sample

df <- df %>%    
  mutate(datetime = str_c(as.character(Date), as.character(Time24), sep = ' ')) %>% # join date and time
  mutate(datetime = lubridate::dmy_hms(datetime)) %>% # convert to datetime object
  mutate(num_datetime = as.numeric(datetime)) %>% # numerical version of datetime required for loess fitting
  arrange(datetime)  # put times in order

# Take care with the time zone.  The function dmy_hms defaults to UTC. 
# You may need to use the timezone for your area e.g. for me it would be tz = 'australia/melbourne'


# we can then plot

df %>% 
  ggplot(aes(x = datetime, y = MPM25)) +
  geom_point() +
  geom_smooth(span = 0.9) # loess smooth


# fitting a loess 
m_loess <- loess(MPM25 ~ num_datetime, data = df, span = 0.9) #fit a loess model

# Create predictions

date_seq <- seq(from = 1558451891, # 100 points from the first to the late datetime
                to = 1558453902,
                length.out = 100)



m_loess_pred <- predict(m_loess, 
                        newdata = data.frame(num_datetime =  date_seq)) 


# To plot the dates they need to be in POSIXct format
date_seq <- as.POSIXct(date_seq, tz = 'UTC', origin = "1970-01-01")

# Create a dataframe with times and predictions
df_predict <- data.frame(datetime = date_seq, pred = m_loess_pred)


# Plot to show that geom_smooth and the loess predictions are the same
df %>% 
  ggplot(aes(x = datetime, y = MPM25)) +
  geom_point() +
  geom_smooth(span = 0.9, se = FALSE) +
  geom_point(data = df_predict, aes(x = datetime, y = pred) , colour = 'orange')

Upvotes: 0

Tony Ladson
Tony Ladson

Reputation: 3649

You can fit a loess model to the data and then use predict to determine the points to plot.

library(tidyverse)   

# Generate some noisy data
x <- seq(1,100) 
y <- x + rnorm(100, sd = 20)

df <- tibble(x = x, y = y)

# plot with a smooth
df %>% 
  ggplot(aes(x,y)) +
  geom_point() +
  geom_smooth(method = "loess")

# Alteratively
m_loess <- loess(y ~ x, df) #fit a loess model
m_loess_pred <- predict(m_loess) # predict for each data point

df <- df %>% # add predictions to data frame for plotting
  add_column(m_loess_pred)  

df %>% # plot
  ggplot(aes(x,m_loess_pred)) +
  geom_point() 

Upvotes: 3

Related Questions