Reputation: 93
I have a csv file that contains estimates of the population from 2010-2019. I've used the predict()
function to estimate the population from 2020 to 2024. How would I combine these two plots to where 2020 starts where 2019 left off on the x- axis? Would the function ggarrange be the best option?
Also, how would I change the x-tick marks to show at 2020, 2021,2022,2023,2024? It currently just shows 1,2,3,4,5. I tried the scale_x_discrete
function but to no avail.
library(ggplot2)
library(tidyr)
library(tidyverse)
pops <- read_csv("nst-est2019-popchg2010_2019.csv")
OK_pops<- filter(pops, NAME == "Oklahoma")
pop_OK <- pivot_longer(OK_pops,
cols=starts_with("POP"),
names_to="Year",
names_prefix = "POPESTIMATE",
values_to = "Population"
)
options(digits=4)
pop_OK <- transform(pop_OK, Population=as.numeric(Population))
pop_OK <- transform(pop_OK, Year=as.numeric(Year))
str(pop_OK)
ggplot(pop_OK) + geom_point(aes(x=Year, y=Population))
abline(pop_OK)
model <-lm(formula = Population ~ Year, data = pop_OK)
summary(model)
pred <- predict(model, newdata=data.frame(Year=2020:2024))
setNames(pred, 2020:2024)
plot(pred, pch = 16, col = "blue" )
scale_x_discrete(breaks=c("1", "2", "3", "4", "5"),
labels=c("2020","2021","2022","2023","2024"))
Upvotes: 0
Views: 449
Reputation: 126
you need to use rbind similar to this:
new_data <- rbind(pop_ok, pred$fit)
You need to realize that the predict function has three columns of fit, lwr (lower) and upr (upper) as output. If you grab the fit column then you are loosing the upper and lower confidence intervals.
Hope this helps.
Upvotes: 1