Reputation: 41
I have a data set with number of people at a certain age (ranging from 0-105+), recorded in the period 1846-2014, and I am making a scatterplot of the summed amount of people by year; there's one data set for males and one for females. After that, I am going to add a trend line, but I am having problems figuring out how.
This is what I've got so far:
B <- as.matrix(read.table("clipboard"))
head(B)
age <- 0:105
y <- 1846:2014
plot(c(1846:2014), c(colSums(B)), col=3, xlab="Year", ylab="Summed age", main="Summed people")
This gives me the plot, but I am not sure how to add the trend line. Please help. Plot looks like this: https://www.dropbox.com/s/5dono5bjrmqylcp/Plot.png?dl=0
Upvotes: 4
Views: 26824
Reputation: 226162
I downloaded your data file and posted it somewhere accessible.
urlsrc <- "http://www.math.mcmaster.ca/bolker/misc"
urlfn <- "201512516853914205393FolkemEttAarig.tsv"
d <- read.delim(url(paste(urlsrc,urlfn,sep="/")),header=TRUE,
check.names=FALSE)
dm <- d[,3:171]
y <- as.numeric(names(dm))
Now make the plot:
plot(y, colSums(dm),
col=3, xlab="Year", ylab="Summed age", main="Summed people")
abline(lm(colSums(dm) ~ y))
You can also do it like this:
library("tidyr")
library("ggplot2"); theme_set(theme_bw())
library("dplyr")
d2 <- gather(dm,year,pop,convert=TRUE)
d3 <- d2 %>% group_by(year) %>% summarise(pop=mean(pop))
ggplot(d3,aes(year,pop)) + geom_point() +
geom_smooth(method="lm")
There is a confidence interval around this trend line, but it's so narrow that it's hard to see.
update: I accidentally used the mean instead of the sum in the second plot, but of course it should be easy to change that.
Upvotes: 9