SJJ
SJJ

Reputation: 53

Scatter plot and line of best fit - Two sets

I am new to R. I have a data set with men's and women's race times on it. I'm getting it to plot on a scatter plot. Now I would just like to add two lines of best fit. One for my data on men. One for my data on women. Can anyone help?

   #Clear out old variables
   rm(list=ls())

   #Insert Data
   library(readxl)
   gender_data <- 
   read_excel("Desktop/gender_data.xlsx")
   View(gender_data)
   library(ggplot2)

   #Matrix 
   times_df <- data.frame(gender_data)
   print(gender_data)

   #data set men data
   plot(x = gender_data$ "Olympic year", y = 
   gender_data$ "Men's winning time (s)",
     xlab = "year", ylab = "times", ylim = 
   c(7,13), col = "green", pch = "*")

   #data set women data
   points(x = gender_data$ "Olympic year", y = 
   gender_data$ "Women's winning time (s)", 
   col = "blue", pch = "`")

Here is my data:

gender_data <-
structure(list(`Olympic year` = c(1900, 1904, 1908, 1912, 1916, 
1920, 1924, 1928, 1932, 1936, 1940, 1944, 1948, 1952, 1956, 1960, 
1964, 1968, 1972, 1976, 1980, 1984, 1988, 1992, 1996, 2000, 2004
), `Men's winning time (s)` = c(11, 11, 10.8, 10.8, NA, 10.8, 
10.6, 10.8, 10.3, 10.3, NA, NA, 10.3, 10.4, 10.5, 10.2, 10, 9.95, 
10.14, 10.06, 10.25, 9.99, 9.92, 9.96, 9.84, 9.87, 9.85), 
`Women's winning time (s)` = c(NA, NA, NA, NA, NA, NA, NA, 12.2, 
11.9, 11.5, NA, NA, 11.9, 11.5, 11.5, 11, 11.4, 11.08, 11.07, 11.08, 
11.06, 10.97, 10.54, 10.82, 10.94, 10.75, 10.93)), 
class = "data.frame", row.names = c(NA, -27L))

Upvotes: 2

Views: 1885

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76402

This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.

Here is a base R solution for the plot.

library(tidyr)

pivot_longer(gender_data, -`Olympic year`) -> gender_long

plot(value ~ `Olympic year`, gender_long, col = c("blue", "red"))
abline(lm(value ~ `Olympic year`,
          data = gender_long,
          subset = name == "Men's winning time (s)"),
       col = "blue")
abline(lm(value ~ `Olympic year`,
          data = gender_long,
          subset = name == "Women's winning time (s)"),
       col = "red")

enter image description here

Upvotes: 2

Duck
Duck

Reputation: 39585

Try with ggplot2 and tidyverse functions. You can reshape to long keeping the year and then use geom_point() for the scatter style. About best fit you can use geom_smooth() in order to create a line representing the best fit. Also, you could avoid method='lm' and leave the default option with loess. Here the code:

library(dplyr)
library(tidyr)
library(ggplot2)
#Code
gender_data %>% pivot_longer(-c(`Olympic year`)) %>%
  ggplot(aes(x=factor(`Olympic year`),y=value,color=name,group=name))+
  geom_point()+
  geom_smooth(method = 'lm',se=F)+
  theme(axis.text.x = element_text(angle = 90),
        legend.position = 'top')+
  labs(x='Year',color='Variable')

Output:

enter image description here

The default option would be:

#Code 2
gender_data %>% pivot_longer(-c(`Olympic year`)) %>%
  ggplot(aes(x=factor(`Olympic year`),y=value,color=name,group=name))+
  geom_point()+
  geom_smooth(se=F)+
  theme(axis.text.x = element_text(angle = 90),
        legend.position = 'top')+
  labs(x='Year',color='Variable')

Output:

enter image description here

Upvotes: 3

Related Questions