John Garland
John Garland

Reputation: 513

How to plot each level of a factor

I am trying to make a plot where each level of a factor gets its own series. While I am a long time user of R I am not up with some of the latest improvements. For example I have not yet learned ggplot which figures in some related questions but I cannot yet translate what I want to do into ggplot. Here is a simple example:

#library(tidyverse) # uncomment if not loaded

in_data <- read_csv("http://www.nfgarland.ca/National_Custom_Data.csv")
in_data <- in_data %>% 
  mutate(Tot = in_data$`NUM INFLUENZA DEATHS`+in_data$`NUM PNEUMONIA DEATHS`) %>% 
  arrange(SEASON) %>%
  mutate(SEASON = factor(SEASON,ordered=TRUE)) 

filter(in_data,SEASON == "2015-16")$Tot %>% plot((1:length(.)),
                                             ., 
                                             type = "l",
                                             col = "red",
                                             xlab ="Flu Season Week",
                                             ylab = "Deaths",
                                             ylim = c(2000,7500))
filter(in_data,SEASON == "2016-17")$Tot %>% lines((1:length(.)),., col="orange")
filter(in_data,SEASON == "2017-18")$Tot %>% lines((1:length(.)),. ,col="blue")
filter(in_data,SEASON == "2018-19")$Tot %>% lines((1:length(.)),. ,col="green")
filter(in_data,SEASON == "2019-20")$Tot %>% lines((1:length(.)),., ,col="black")

` As you can see I have learned a number of tidyverse concepts and this code works fine. But I assume there really ought to be a way to do this automagically in the tidyverse without defining each and every lines() separately, I would think, and I cannot identify it. I do know how to handle palettes, so the color changes are no problem. Note also that while there are 52 weeks of data for previous seasons, in this file there are only 24 weeks gone in the present flu season year.

Upvotes: 2

Views: 262

Answers (2)

StupidWolf
StupidWolf

Reputation: 46908

You need to use a for loop, and of course, unlike ggplot2, you got to specify legends as well. Below is a suggestion in base R (good old days) you can do:

library(readr)
library(dplyr)

COLS = c("red","goldenrod","blue","orange","green")
names(COLS) = levels(in_data$SEASON)

plot(NULL,xlim=range(in_data$WEEK),ylim=range(in_data$Tot),
xlab="time",ylab="Tot")
for(nu in levels(in_data$SEASON)){
lines(1:sum(in_data$SEASON == nu),
in_data$Tot[in_data$SEASON == nu],
col = COLS[nu])
}

legend("topright",fill=COLS,names(COLS))

enter image description here

If you need to specify the weeks, since like you mentioned in the comment, it goes from week 40+ to next year.. it might be a bit more code (and maybe pain)

Upvotes: 0

Ian Campbell
Ian Campbell

Reputation: 24790

How about like this?

library(ggplot2)
ggplot(in_data, aes(x=WEEK,y=Tot, color = SEASON)) + 
  geom_line() + 
  labs(x = "Flu Season Week", y = "Deaths") +
  ylim(2000,7500) + 
  scale_color_manual(values = c("red","goldenrod","blue","orange","green"))

enter image description here

Edit: Addressing OP's comment about wanting to break the 2019-20 data, we can use a quick pivot to fill in the missing values.

in_data %>% dplyr::select(SEASON,Tot,WEEK) %>%
  tidyr::pivot_wider(names_from = SEASON, values_from = Tot) %>%
  pivot_longer(cols = (-WEEK), names_to = "SEASON", values_to = "Tot") %>%
ggplot(aes(x=WEEK,y=Tot, color = SEASON)) + 
  geom_line() + 
  labs(x = "Flu Season Week", y = "Deaths") +
  ylim(2000,7500) + 
  scale_color_manual(values = c("red","goldenrod","blue","orange","green"))

enter image description here

Upvotes: 3

Related Questions