Reputation: 452
I have some data similar to this:
year car_type
1 1993 sport
2 1994 sport
3 1945 family
4 1955 off-road
5 1998 sport
6 1966 off-road
7 2001 super
8 1999 super
9 2010 super
10 1988 off-road
11 1988 off-road
12 1988 sport
13 2014 sport
14 2056 super
15 2022 family
16 2022 family
17 2008 family
18 2001 off-road
19 2018 super
20 2008 family
21 2020 sport
22 2013 sport
23 2014 super
24 2015 off-road
25 2014 off-road
26 2013 sport
27 2013 super
28 2014 super
29 2020 off-road
30 2020 sport
note: both year and car_type can occur more than once.
I want to plot a line graph or scatter plot with x axis being the year and y axis being the number of times a car occurs in that year(any car_type occurs).
I can gather how to plot multiple lines from here https://r-graphics.org/recipe-line-graph-multiple-line however I don't know how to plot a line graph of one variable and its occurrences. So x axis be the date and y being the number of times that date would occur. Same with scatter plot.
I can do the same concept in a stacked bar chart:
However that doesn't show the occurrence of these cars over time. Any help would be appreciated.
Upvotes: 0
Views: 1353
Reputation: 79246
Maybe you are interested in this kind of solution?
library(tidyverse)
library(lubridate) # for working with dates
library(scales) # to access breaks/formatting functions
df %>%
group_by(year) %>%
dplyr::count(car_type) %>%
dplyr::summarise(N = sum(n)) %>%
arrange(year) %>%
mutate(year = lubridate::ymd(year, truncated = 2L)) %>%
ggplot +
aes(x=year, y=N) +
geom_line( color="steelblue", size=1) +
scale_x_date(breaks=date_breaks("5 year"), date_labels = "%Y") +
geom_point() +
xlab("") +
theme_bw() +
theme(axis.text.x=element_text(angle=60, hjust=1)) +
xlab("year") +
ylab("Cars(N)") +
ylim(0,6) +
ggtitle("Cars per year")
df <- data.frame(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30),
year = c(1993, 1994, 1945, 1955, 1998, 1966, 2001, 1999,
2010, 1988, 1988, 1988, 2014, 2056, 2022, 2022, 2008, 2001, 2018,
2008, 2020, 2013, 2014, 2015, 2014, 2013, 2013, 2014, 2020, 2020),
car_type = c("sport", "sport", "family", "off-road", "sport",
"off-road", "super", "super", "super", "off-road", "off-road",
"sport", "sport", "super", "family", "family", "family", "off-road",
"super", "family", "sport", "sport", "super", "off-road", "off-road",
"sport", "super", "super", "off-road", "sport"))
Upvotes: 1
Reputation: 12739
This is a version based on your question for a scatter plot graph using the data in the question.
library(ggplot2)
library(dplyr)
The problem with a simple scatter plot is that as you have a discrete axis points will overlap as in the first example.
ggplot(df)+
geom_point(aes(year, car))
To make the graph more meaningful you can summarise the data by count of cars for a given category and year as follows:
df1 <-
df %>%
group_by(year, car) %>%
summarise(count = n())
ggplot(df1)+
geom_point(aes(year, car, size = count))+
scale_size_continuous(breaks = unique(df1$count))
data
df <- structure(list(id = 2:30, year = c(1994L, 1945L, 1955L, 1998L,
1966L, 2001L, 1999L, 2010L, 1988L, 1988L, 1988L, 2014L, 2056L,
2022L, 2022L, 2008L, 2001L, 2018L, 2008L, 2020L, 2013L, 2014L,
2015L, 2014L, 2013L, 2013L, 2014L, 2020L, 2020L), car = c("sport",
"family", "off-road", "sport", "off-road", "super", "super",
"super", "off-road", "off-road", "sport", "sport", "super", "family",
"family", "family", "off-road", "super", "family", "sport", "sport",
"super", "off-road", "off-road", "sport", "super", "super", "off-road",
"sport")), class = "data.frame", row.names = c(NA, -29L))
Created on 2021-04-10 by the reprex package (v2.0.0)
Upvotes: 1
Reputation: 38063
In ggplot2, layers have two important components: a geom and a stat. Some layers, like geom_bar()
have automatically attached non-identity stat parts, in this case the stat_count()
. If you want to replicate geom_bar()
behaviour with geom_line()
, you need to supply the right stat to the layer.
library(ggplot2)
# Assuming 'data' is a data.frame with the data you've posted
ggplot(data, aes(year, colour = car_type)) +
geom_line(stat = "count")
Upvotes: 1