Hanif Shidki
Hanif Shidki

Reputation: 53

How do I smooth data whilst giving out the smoothed data values into a new data frame?

I have a set of data which I've re-sampled, is there a command of a function that I can use in R to smooth the data first, and only then create the graph from the created data frame?.

My data has a lot of noise, an after I've re-sampled the data, now I want to smooth out the data, I used the geom_smooth to produce a graphic of the data, but the command only creates the graphical representation of the smoothed out data, without giving out the values of the points it represented.


use ggplot
library(ggplot2)
library(dplyr)
library(plotly)

df <- read.csv("data.csv", header = T)

str(df)

rs <- sample_n(df,715)

q <- 
  ggplot(df,aes(x,y)) + 
  geom_line() + 
  geom_smooth(method = "loess", formula = y~log(x), span = 0.05)

This is what I used to smooth out my data, I used loess, formula = y~log(x), span = 0.05 because out of all the smoothing out method I've tried, this is the closest result to what I want which is smoothing with the least errors or differences from the original data.

this is a printout of the head(rs) and glimpse(rs)

> head(rs)
        Date  DLTime   Time24   RH Temp PM2.5     CO2    MCO2 MPM25                   t
1 21/05/2019 8:33:21 15:21:36 73.5 25.9    34 1096.88 1096.88    34 2019-05-21 15:21:36
2 21/05/2019 8:56:33 15:44:48 75.4 25.6    32  975.00  975.00    32 2019-05-21 15:44:48
3 21/05/2019 8:22:43 15:10:58 75.9 26.1    59 1068.75 1068.75    59 2019-05-21 15:10:58
4 21/05/2019 8:51:53 15:40:08 74.7 25.6    45  975.00  975.00    45 2019-05-21 15:40:08
5 21/05/2019 8:47:30 15:35:45 75.0 25.7    40 1006.25 1006.25    40 2019-05-21 15:35:45
6 21/05/2019 8:35:59 15:24:14 73.7 25.8    32 1984.38 1068.75    32 2019-05-21 15:24:14
> glimpse(rs)
Observations: 715
Variables: 10
$ Date   <fct> 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019,...
$ DLTime <fct> 8:33:21, 8:56:33, 8:22:43, 8:51:53, 8:47:30, 8:35:59, 8:17:13, 8:57:42, 8:20:34, 8:48:21, 8:34:...
$ Time24 <fct> 15:21:36, 15:44:48, 15:10:58, 15:40:08, 15:35:45, 15:24:14, 15:05:28, 15:45:57, 15:08:49, 15:36...
$ RH     <dbl> 73.5, 75.4, 75.9, 74.7, 75.0, 73.7, 76.6, 75.1, 75.6, 75.1, 74.4, 75.6, 73.8, 76.6, 73.9, 76.3,...
$ Temp   <dbl> 25.9, 25.6, 26.1, 25.6, 25.7, 25.8, 26.2, 25.6, 26.1, 25.7, 25.9, 25.8, 25.4, 26.2, 25.5, 26.2,...
$ PM2.5  <int> 34, 32, 59, 45, 40, 32, 42, 34, 35, 45, 36, 33, 29, 42, 46, 36, 42, 33, 35, 33, 39, 32, 39, 35,...
$ CO2    <dbl> 1096.88, 975.00, 1068.75, 975.00, 1006.25, 1984.38, 1328.13, 946.88, 1068.75, 1328.13, 1434.38,...
$ MCO2   <dbl> 1096.88, 975.00, 1068.75, 975.00, 1006.25, 1068.75, 1037.50, 946.88, 1068.75, 1021.88, 1112.50,...
$ MPM25  <dbl> 34, 32, 59, 45, 40, 32, 42, 34, 35, 45, 36, 33, 29, 42, 46, 36, 42, 33, 35, 33, 39, 32, 39, 35,...
$ t      <dttm> 2019-05-21 15:21:36, 2019-05-21 15:44:48, 2019-05-21 15:10:58, 2019-05-21 15:40:08, 2019-05-21...

I have also tried

ml <- with(rs, loess(formula = y~log(x), span = 0.5))

mp <- predict(ml)

but it resulted in this error message

ml <- loess(formula = y~log(x), with(rs), span = 0.5)
Error in eval(substitute(expr), data, enclos = parent.frame()) : 
  argument is missing, with no default

I dont really understand where I went wrong, because any troubleshooting I've done through the internet didn't really gave me a definitive answer. If there are other methods, please do tell me.

I apologize for not giving a reproducible example, I am not far enough into learning R that I can create a random data, any help is appreciated, thanks in advance.

Upvotes: 1

Views: 1098

Answers (3)

Hanif Shidki
Hanif Shidki

Reputation: 53

Considering I only needed to know the smooth value of one variable, I used:

#smoothing out only one variable
ml <- loess(formula = rs$MCO2~log(rs$Num),  span = 0.5)

#predicting the values of the smooth data
mp <- predict(ml)

#insert predicted data values into resampled data frame
rs$pre <- mp

I added a new column to my data consisting of a number series of my data (1-the end), so I can insert my data into the `y~log(x)` formula, because when I entered the `t` variable which is a `as.POSIXct` date and time argument, it resulted in an error.

To keep the values of the predicted data, I used:

write.csv(rs,"newdata.csv", row.names = FALSE)

Thank you for all the help and anwers.

@Andrew Baxter
@S Robidoux
@Jon Spring
@Jimbou

Upvotes: 0

Andy Baxter
Andy Baxter

Reputation: 7636

As mentioned above, the loess function does the same thing as the geom_smooth(method = "loess") in smoothing the data, then when the model is passed to the predict function you get a vector of the new dependent variables. You can plot these on a graph to check:

library(dplyr)
library(ggplot2)

a <- rnorm(100)
b <- rnorm(100, mean = 4, sd = 20)*a

df <- tibble(a,b)

df_predict <- df[,"a"]

df_predict[,"b"] <- df %>%
  loess(b ~ a, data = ., span = 0.5) %>% 
  predict()

df %>%
  ggplot(aes(a,b)) +
  geom_point(col = "blue") +
  geom_smooth(method = "loess", span = 0.5, col = "red") +
  geom_point(data = df_predict, col = "red")

df_predict

# A tibble: 100 x 2
        a       b
    <dbl>   <dbl>
 1  0.116   0.502
 2  0.870  -3.44 
 3  0.336   1.16 
 4 -1.16   -9.32 
 5  1.73    8.88 
 6  0.236   0.756
 7  0.485   0.302
 8 -1.13   -9.58 
 9 -0.778 -10.1  
10 -2.76   11.9  
# ... with 90 more rows

This gives the following graph, with the raw data plotted in blue, the red line from the geom_smooth function and the red dots from the loess formula using predict to give the df_predict dataframe:

enter image description here

Upvotes: 1

S Robidoux
S Robidoux

Reputation: 11

These two lines are not the same:

ml <- with(rs, loess(formula = y~log(x), span = 0.5))

ml <- loess(formula = y~log(x), with(rs), span = 0.5)

I've tested the first one without any trouble (using some faked data). The second one fails because you haven't given with() an expression to evaluate.

This also works with a similar structure to the second command you provided, but seems a bit pointless:

ml <- loess(formula = y~log(x), with(rs, rs), span = 0.5)

Upvotes: 1

Related Questions