Reputation: 53
I am new to R. I have a data set with men's and women's race times on it. I'm getting it to plot on a scatter plot. Now I would just like to add two lines of best fit. One for my data on men. One for my data on women. Can anyone help?
#Clear out old variables
rm(list=ls())
#Insert Data
library(readxl)
gender_data <-
read_excel("Desktop/gender_data.xlsx")
View(gender_data)
library(ggplot2)
#Matrix
times_df <- data.frame(gender_data)
print(gender_data)
#data set men data
plot(x = gender_data$ "Olympic year", y =
gender_data$ "Men's winning time (s)",
xlab = "year", ylab = "times", ylim =
c(7,13), col = "green", pch = "*")
#data set women data
points(x = gender_data$ "Olympic year", y =
gender_data$ "Women's winning time (s)",
col = "blue", pch = "`")
Here is my data:
gender_data <-
structure(list(`Olympic year` = c(1900, 1904, 1908, 1912, 1916,
1920, 1924, 1928, 1932, 1936, 1940, 1944, 1948, 1952, 1956, 1960,
1964, 1968, 1972, 1976, 1980, 1984, 1988, 1992, 1996, 2000, 2004
), `Men's winning time (s)` = c(11, 11, 10.8, 10.8, NA, 10.8,
10.6, 10.8, 10.3, 10.3, NA, NA, 10.3, 10.4, 10.5, 10.2, 10, 9.95,
10.14, 10.06, 10.25, 9.99, 9.92, 9.96, 9.84, 9.87, 9.85),
`Women's winning time (s)` = c(NA, NA, NA, NA, NA, NA, NA, 12.2,
11.9, 11.5, NA, NA, 11.9, 11.5, 11.5, 11, 11.4, 11.08, 11.07, 11.08,
11.06, 10.97, 10.54, 10.82, 10.94, 10.75, 10.93)),
class = "data.frame", row.names = c(NA, -27L))
Upvotes: 2
Views: 1885
Reputation: 76402
This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.
Here is a base R solution for the plot.
library(tidyr)
pivot_longer(gender_data, -`Olympic year`) -> gender_long
plot(value ~ `Olympic year`, gender_long, col = c("blue", "red"))
abline(lm(value ~ `Olympic year`,
data = gender_long,
subset = name == "Men's winning time (s)"),
col = "blue")
abline(lm(value ~ `Olympic year`,
data = gender_long,
subset = name == "Women's winning time (s)"),
col = "red")
Upvotes: 2
Reputation: 39585
Try with ggplot2
and tidyverse
functions. You can reshape to long keeping the year and then use geom_point()
for the scatter style. About best fit you can use geom_smooth()
in order to create a line representing the best fit. Also, you could avoid method='lm'
and leave the default option with loess
. Here the code:
library(dplyr)
library(tidyr)
library(ggplot2)
#Code
gender_data %>% pivot_longer(-c(`Olympic year`)) %>%
ggplot(aes(x=factor(`Olympic year`),y=value,color=name,group=name))+
geom_point()+
geom_smooth(method = 'lm',se=F)+
theme(axis.text.x = element_text(angle = 90),
legend.position = 'top')+
labs(x='Year',color='Variable')
Output:
The default option would be:
#Code 2
gender_data %>% pivot_longer(-c(`Olympic year`)) %>%
ggplot(aes(x=factor(`Olympic year`),y=value,color=name,group=name))+
geom_point()+
geom_smooth(se=F)+
theme(axis.text.x = element_text(angle = 90),
legend.position = 'top')+
labs(x='Year',color='Variable')
Output:
Upvotes: 3