JohnDoe
JohnDoe

Reputation: 41

Adding a trend line to a scatterplot using R

I have a data set with number of people at a certain age (ranging from 0-105+), recorded in the period 1846-2014, and I am making a scatterplot of the summed amount of people by year; there's one data set for males and one for females. After that, I am going to add a trend line, but I am having problems figuring out how.

This is what I've got so far:

B <- as.matrix(read.table("clipboard"))
head(B)
age <- 0:105
y <- 1846:2014
plot(c(1846:2014), c(colSums(B)), col=3, xlab="Year", ylab="Summed age", main="Summed people")

This gives me the plot, but I am not sure how to add the trend line. Please help. Plot looks like this: https://www.dropbox.com/s/5dono5bjrmqylcp/Plot.png?dl=0

Data available here: https://www.ssb.no/statistikkbanken/SelectVarVal/Define.asp?subjectcode=01&ProductId=01&MainTable=FolkemEttAarig&SubTable=1&PLanguage=1&nvl=True&Qid=0&gruppe1=Hele&gruppe2=Hele&gruppe3=Hele&VS1=AlleAldre00B&VS2=Kjonn3&VS3=&mt=0&KortNavnWeb=folkemengde&CMSSubjectArea=befolkning&StatVariant=&checked=true

Upvotes: 4

Views: 26824

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226162

I downloaded your data file and posted it somewhere accessible.

urlsrc <- "http://www.math.mcmaster.ca/bolker/misc"
urlfn <- "201512516853914205393FolkemEttAarig.tsv"
d <- read.delim(url(paste(urlsrc,urlfn,sep="/")),header=TRUE,
                check.names=FALSE)
dm <- d[,3:171]
y <- as.numeric(names(dm))

Now make the plot:

plot(y, colSums(dm),
           col=3, xlab="Year", ylab="Summed age", main="Summed people")
abline(lm(colSums(dm) ~ y))

enter image description here

You can also do it like this:

library("tidyr")
library("ggplot2"); theme_set(theme_bw())
library("dplyr")
d2 <- gather(dm,year,pop,convert=TRUE)
d3 <- d2 %>% group_by(year) %>% summarise(pop=mean(pop))
ggplot(d3,aes(year,pop)) + geom_point() + 
    geom_smooth(method="lm")

enter image description here

There is a confidence interval around this trend line, but it's so narrow that it's hard to see.

update: I accidentally used the mean instead of the sum in the second plot, but of course it should be easy to change that.

Upvotes: 9

Related Questions