Reputation:
Here is my data:
df1<-read.table(text=" x y
2 20
3 36
3 48
1 20
3 40
3 32
1 16
1 20
3 24
3 28
3 32
4 36
2 20
4 44
4 36
4 40
4 48
3 40
4 52
4 52
4 52
4 44
4 48
4 52
1 16
3 32
4 52
3 32
3 36
",header=TRUE)
I want to use the Monte Carlo Simulation using df1.
I have done the following task to do it:
df2 <- df1 %>% sample_n(size = 1000, replace = TRUE)
lm(y~x,data=df2)
Am I correct? Could we do better? Do I need to calculate "a" and "b" and then simulate df1? If yes, could you show me, please?
Upvotes: 0
Views: 731
Reputation: 4150
Here is another much less clear answer
library(tidymodels)
set.seed(42)
bootstrap_data <- df1 %>%
rsample::bootstraps(100)
fit_lm_on_bootstrap <- function(split) {
lm(y ~ x,data= split)
}
boot_models <- bootstrap_data %>%
mutate(model = map(.x = splits,fit_lm_on_bootstrap),
tidy_results = map(model,tidy)) %>%
unnest(tidy_results)
boot_models %>%
filter(term == "(Intercept)") %>%
summarise_at(vars(estimate:p.value),mean)
# A tibble: 1 x 4
estimate std.error statistic p.value
<dbl> <dbl> <dbl> <dbl>
1 4.07 3.77 1.23 0.298
boot_models %>%
filter(term == "x") %>%
summarise_at(vars(estimate:p.value),mean)
# A tibble: 1 x 4
estimate std.error statistic p.value
<dbl> <dbl> <dbl> <dbl>
1 10.4 1.16 9.25 0.000000136
Upvotes: 2
Reputation: 4150
One cool way is using the infer package
library(tidyverse)
library(infer)
df1 %>%
specify(y ~ x) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "correlation") %>%
summarise(odds = stat %>% mean(),sd = stat %>% sd)
df1 %>%
specify(y ~ x) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "slope") %>%
summarise(beta = stat %>% mean,sd = stat %>% sd)
Upvotes: 0