Jasmine Koh
Jasmine Koh

Reputation: 41

How can I make a line graph with multiple lines with each row a different line? in R

I'm trying to make a line graph that will show some projected growth. This is part of the dataframe I'm using.

                           skill skillpostingpct one_yrgrowth two_yrgrowth five_yrgrowth
17              Network Security       0.1529210    0.1208623   0.08748219   -0.01860132
5                         Python       0.1701031    0.2948260   0.42366650    0.82719194
4             Project Management       0.2268041    0.2157136   0.20596367    0.18497099
3            Information Systems       0.2405498    0.1884082   0.13563518   -0.02358238
2           Information Security       0.6116838    0.6500081   0.68701918    0.78847658
1  Quality Assurance and Control       0.9106529    0.9046785   0.89953675    0.88918069

How can I make a line graph that shows projected growth with y-axis as percentage and x axis as each of the numerical columns (skill posting pct, one_yr, two_yr, five_yr). My biggest issue is also making a legend so that each skill name (column one) is a different line and the skill names are the labels in the legend.

I'd really appreciate any help on this, thank you!

Upvotes: 0

Views: 608

Answers (1)

Jon Spring
Jon Spring

Reputation: 66935

ggplot2 is designed to work easiest with "tidy" data, where:

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table.

"Tidy" data works most smoothly with the syntax of ggplot, which expects to map each variable (e.g. skill, growth rate, time period) from the column it appears in to an aesthetic (like x, y, and color).

In this case, your starting format is "wide," with multiple observations in each row, where each column is encoding a different value of time. In longer form, we could show all the values in the same column, but in different rows distinguished by different values in a "time" column. This can be achieved with your data using pivot_longer from the tidyr package, loaded with tidyverse.

Since the time columns have semantic ordered value, and we don't want ggplot to plot them in alphabetic order by default, I use forcats::fct_inorder here to make time be an ordered factor in order of its appearance. Then when I use that variable to plot the x axis, it appears in the order we want. (Try replacing time with name in the ggplot(... line and you'll see five_yrgrowth appear first since it's earlier alphabetically.)

library(tidyverse)
df %>%
  pivot_longer(-skill) %>%
  mutate(time = forcats::fct_inorder(name)) %>%
  ggplot(aes(time, value, color = skill, group = skill)) +
  geom_line()

enter image description here

Upvotes: 1

Related Questions