Reputation: 53
I have a set of data which I've re-sampled, is there a command of a function that I can use in R to smooth the data first, and only then create the graph from the created data frame?.
My data has a lot of noise, an after I've re-sampled the data, now I want to smooth out the data, I used the geom_smooth
to produce a graphic of the data, but the command only creates the graphical representation of the smoothed out data, without giving out the values of the points it represented.
use ggplot
library(ggplot2)
library(dplyr)
library(plotly)
df <- read.csv("data.csv", header = T)
str(df)
rs <- sample_n(df,715)
q <-
ggplot(df,aes(x,y)) +
geom_line() +
geom_smooth(method = "loess", formula = y~log(x), span = 0.05)
This is what I used to smooth out my data, I used loess, formula = y~log(x), span = 0.05 because out of all the smoothing out method I've tried, this is the closest result to what I want which is smoothing with the least errors or differences from the original data.
this is a printout of the head(rs)
and glimpse(rs)
> head(rs)
Date DLTime Time24 RH Temp PM2.5 CO2 MCO2 MPM25 t
1 21/05/2019 8:33:21 15:21:36 73.5 25.9 34 1096.88 1096.88 34 2019-05-21 15:21:36
2 21/05/2019 8:56:33 15:44:48 75.4 25.6 32 975.00 975.00 32 2019-05-21 15:44:48
3 21/05/2019 8:22:43 15:10:58 75.9 26.1 59 1068.75 1068.75 59 2019-05-21 15:10:58
4 21/05/2019 8:51:53 15:40:08 74.7 25.6 45 975.00 975.00 45 2019-05-21 15:40:08
5 21/05/2019 8:47:30 15:35:45 75.0 25.7 40 1006.25 1006.25 40 2019-05-21 15:35:45
6 21/05/2019 8:35:59 15:24:14 73.7 25.8 32 1984.38 1068.75 32 2019-05-21 15:24:14
> glimpse(rs)
Observations: 715
Variables: 10
$ Date <fct> 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019,...
$ DLTime <fct> 8:33:21, 8:56:33, 8:22:43, 8:51:53, 8:47:30, 8:35:59, 8:17:13, 8:57:42, 8:20:34, 8:48:21, 8:34:...
$ Time24 <fct> 15:21:36, 15:44:48, 15:10:58, 15:40:08, 15:35:45, 15:24:14, 15:05:28, 15:45:57, 15:08:49, 15:36...
$ RH <dbl> 73.5, 75.4, 75.9, 74.7, 75.0, 73.7, 76.6, 75.1, 75.6, 75.1, 74.4, 75.6, 73.8, 76.6, 73.9, 76.3,...
$ Temp <dbl> 25.9, 25.6, 26.1, 25.6, 25.7, 25.8, 26.2, 25.6, 26.1, 25.7, 25.9, 25.8, 25.4, 26.2, 25.5, 26.2,...
$ PM2.5 <int> 34, 32, 59, 45, 40, 32, 42, 34, 35, 45, 36, 33, 29, 42, 46, 36, 42, 33, 35, 33, 39, 32, 39, 35,...
$ CO2 <dbl> 1096.88, 975.00, 1068.75, 975.00, 1006.25, 1984.38, 1328.13, 946.88, 1068.75, 1328.13, 1434.38,...
$ MCO2 <dbl> 1096.88, 975.00, 1068.75, 975.00, 1006.25, 1068.75, 1037.50, 946.88, 1068.75, 1021.88, 1112.50,...
$ MPM25 <dbl> 34, 32, 59, 45, 40, 32, 42, 34, 35, 45, 36, 33, 29, 42, 46, 36, 42, 33, 35, 33, 39, 32, 39, 35,...
$ t <dttm> 2019-05-21 15:21:36, 2019-05-21 15:44:48, 2019-05-21 15:10:58, 2019-05-21 15:40:08, 2019-05-21...
I have also tried
ml <- with(rs, loess(formula = y~log(x), span = 0.5))
mp <- predict(ml)
but it resulted in this error message
ml <- loess(formula = y~log(x), with(rs), span = 0.5)
Error in eval(substitute(expr), data, enclos = parent.frame()) :
argument is missing, with no default
I dont really understand where I went wrong, because any troubleshooting I've done through the internet didn't really gave me a definitive answer. If there are other methods, please do tell me.
I apologize for not giving a reproducible example, I am not far enough into learning R that I can create a random data, any help is appreciated, thanks in advance.
Upvotes: 1
Views: 1098
Reputation: 53
Considering I only needed to know the smooth value of one variable, I used:
#smoothing out only one variable
ml <- loess(formula = rs$MCO2~log(rs$Num), span = 0.5)
#predicting the values of the smooth data
mp <- predict(ml)
#insert predicted data values into resampled data frame
rs$pre <- mp
I added a new column to my data consisting of a number series of my data (1-the end), so I can insert my data into the `y~log(x)` formula, because when I entered the `t` variable which is a `as.POSIXct` date and time argument, it resulted in an error.
To keep the values of the predicted data, I used:
write.csv(rs,"newdata.csv", row.names = FALSE)
Thank you for all the help and anwers.
@Andrew Baxter
@S Robidoux
@Jon Spring
@Jimbou
Upvotes: 0
Reputation: 7636
As mentioned above, the loess
function does the same thing as the geom_smooth(method = "loess")
in smoothing the data, then when the model is passed to the predict
function you get a vector of the new dependent variables. You can plot these on a graph to check:
library(dplyr)
library(ggplot2)
a <- rnorm(100)
b <- rnorm(100, mean = 4, sd = 20)*a
df <- tibble(a,b)
df_predict <- df[,"a"]
df_predict[,"b"] <- df %>%
loess(b ~ a, data = ., span = 0.5) %>%
predict()
df %>%
ggplot(aes(a,b)) +
geom_point(col = "blue") +
geom_smooth(method = "loess", span = 0.5, col = "red") +
geom_point(data = df_predict, col = "red")
df_predict
# A tibble: 100 x 2
a b
<dbl> <dbl>
1 0.116 0.502
2 0.870 -3.44
3 0.336 1.16
4 -1.16 -9.32
5 1.73 8.88
6 0.236 0.756
7 0.485 0.302
8 -1.13 -9.58
9 -0.778 -10.1
10 -2.76 11.9
# ... with 90 more rows
This gives the following graph, with the raw data plotted in blue, the red line from the geom_smooth
function and the red dots from the loess
formula using predict
to give the df_predict
dataframe:
Upvotes: 1
Reputation: 11
These two lines are not the same:
ml <- with(rs, loess(formula = y~log(x), span = 0.5))
ml <- loess(formula = y~log(x), with(rs), span = 0.5)
I've tested the first one without any trouble (using some faked data). The second one fails because you haven't given with()
an expression to evaluate.
This also works with a similar structure to the second command you provided, but seems a bit pointless:
ml <- loess(formula = y~log(x), with(rs, rs), span = 0.5)
Upvotes: 1