Hanif Shidki
Hanif Shidki

Reputation: 53

Is there a function to smooth data whilst giving out the values into a new dataframe?

I have a set of data which I've re-sampled, is there a command that I can use in R to smooth the data first, and only then create the graph from the created data frame?.

My data has a lot of noise, an after I've re-sampled the data, now I want to smooth out the data, I used the geom_smooth to produce a graphic of the data, but the command only creates the graphical representation of the smoothed out data, without giving out the values of the points it represented.


df <- read.csv("data.csv", header = T)


rs <- sample_n(df,715)

q <- 
  ggplot(df,aes(x,y)) + 
  geom_line() + 
  geom_smooth(method = "loess", formula = y~log(x), span = 0.05)

This is what I used to smooth out my data, I used loess, formula = y~log(x), span = 0.05 because out of all the smoothing out method I've tried, this is the closest result to what I want which is smoothing with the least errors or differences from the original data. I apologize for not giving a reproducible example, I am not far enough into learning R that I can create a random data, any help is appreciated, thanks in advance.

Upvotes: 1

Views: 1162

Answers (2)

Tony Ladson
Tony Ladson

Reputation: 3649

This answer is based on the data at imgur.com/a/L22T4BN


# I've reproduced a subset of your data

df <- data.frame(Date = c('21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019'),
  Time24 = c('15:45:22', 
                 MPM25 = c(46, 34, 57, 51, 31, 32,46,33,31))

# Variables: 4
# $ Date     <fct> 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019
# $ Time24   <fct> 15:18:11, 15:18:38, 15:22:10, 15:40:50, 15:45:22, 15:51:42
# $ MPM25    <dbl> 34, 51, 57, 31, 46, 32
# $ datetime <dttm> 2019-05-21 15:18:11, 2019-05-21 15:18:38, 2019-05-21 15:22:10, 2019-0

# Note that the Date and Time24 are factors <fct>
# We can use these values to create a datetime object
# Also note the dates/time are out of order because they some from a random sample

df <- df %>%    
  mutate(datetime = str_c(as.character(Date), as.character(Time24), sep = ' ')) %>% # join date and time
  mutate(datetime = lubridate::dmy_hms(datetime)) %>% # convert to datetime object
  mutate(num_datetime = as.numeric(datetime)) %>% # numerical version of datetime required for loess fitting
  arrange(datetime)  # put times in order

# Take care with the time zone.  The function dmy_hms defaults to UTC. 
# You may need to use the timezone for your area e.g. for me it would be tz = 'australia/melbourne'

# we can then plot

df %>% 
  ggplot(aes(x = datetime, y = MPM25)) +
  geom_point() +
  geom_smooth(span = 0.9) # loess smooth

# fitting a loess 
m_loess <- loess(MPM25 ~ num_datetime, data = df, span = 0.9) #fit a loess model

# Create predictions

date_seq <- seq(from = 1558451891, # 100 points from the first to the late datetime
                to = 1558453902,
                length.out = 100)

m_loess_pred <- predict(m_loess, 
                        newdata = data.frame(num_datetime =  date_seq)) 

# To plot the dates they need to be in POSIXct format
date_seq <- as.POSIXct(date_seq, tz = 'UTC', origin = "1970-01-01")

# Create a dataframe with times and predictions
df_predict <- data.frame(datetime = date_seq, pred = m_loess_pred)

# Plot to show that geom_smooth and the loess predictions are the same
df %>% 
  ggplot(aes(x = datetime, y = MPM25)) +
  geom_point() +
  geom_smooth(span = 0.9, se = FALSE) +
  geom_point(data = df_predict, aes(x = datetime, y = pred) , colour = 'orange')

Upvotes: 0

Tony Ladson
Tony Ladson

Reputation: 3649

You can fit a loess model to the data and then use predict to determine the points to plot.


# Generate some noisy data
x <- seq(1,100) 
y <- x + rnorm(100, sd = 20)

df <- tibble(x = x, y = y)

# plot with a smooth
df %>% 
  ggplot(aes(x,y)) +
  geom_point() +
  geom_smooth(method = "loess")

# Alteratively
m_loess <- loess(y ~ x, df) #fit a loess model
m_loess_pred <- predict(m_loess) # predict for each data point

df <- df %>% # add predictions to data frame for plotting

df %>% # plot
  ggplot(aes(x,m_loess_pred)) +

Upvotes: 3

Related Questions