Reputation: 398
I have a data frame as shown below:
This data is saved as the variable "filterWage".
This data frame contains the columns Country.Code, Series.Code and range of columns from X1992 to X2016 (please bare in mind I could not fit the whole X1992 to X2016 columns thus only up to x2003 is shown on the image).
The objective is to plot this range of columns from X1992 to X2016 as the x-axis and the value of these columns as the y-axis, for all three Country.Code in the same plot using ggplot.
Desired outcome: ( please note that the image is merely a rough sketch and the values are indeed meaningless)
This is the output using dput:
dput(filterWage)
structure(list(Country.Code = c("LIC", "HIC", "MIC"), Series.Code = c("SL.EMP.WORK.ZS",
"SL.EMP.WORK.ZS", "SL.EMP.WORK.ZS"), X1991 = c("20.9370976972316",
"81.0876932275574", "35.5281394063616"), X1992 = c("20.5114136551512",
"81.1351300966788", "36.1635880437505"), X1993 = c("20.309137441086",
"81.2165339365649", "37.1943086793304"), X1994 = c("20.5295488411938",
"81.3404039783739", "37.8383615292357"), X1995 = c("20.6817100202905",
"81.6237989883691", "38.6979499878051"), X1996 = c("20.6371916830899",
"81.8361588628956", "39.5068057398044"), X1997 = c("20.286823787263",
"82.140587079514", "40.0301927962263"), X1998 = c("20.3800244386649",
"82.4387485706644", "40.1689926776"), X1999 = c("20.764112251619",
"82.7303105606365", "40.3738643748966"), X2000 = c("20.5693165666214",
"83.0691410634413", "40.7860042844162"), X2001 = c("20.6682554227926",
"83.204549665691", "40.192062080076"), X2002 = c("20.8364224185492",
"83.3236267668205", "40.5335866623684"), X2003 = c("20.9073131339766",
"83.3872571313811", "41.139037517746"), X2004 = c("20.9741288400519",
"83.4445860257721", "42.2303006080139"), X2005 = c("20.6931847813705",
"83.7017144881631", "43.2626386469723"), X2006 = c("21.0482961178193",
"84.0126990344844", "44.4032188240263"), X2007 = c("21.3789126998501",
"84.3099847840774", "45.3836159214118"), X2008 = c("21.713214795025",
"84.5962197639565", "46.1155674823931"), X2009 = c("21.9697284827288",
"84.5498700141843", "46.8058440395641"), X2010 = c("22.3676584297642",
"84.614095791104", "47.6604416403023"), X2011 = c("22.383629219082",
"84.8323447185694", "48.6708213003224"), X2012 = c("22.6398140927035",
"85.1570293953982", "49.2830314898562"), X2013 = c("23.0490884430663",
"85.3153737253528", "49.5549460027067"), X2014 = c("22.8973838689315",
"85.4292150603637", "50.0215575751258"), X2015 = c("22.9079191238809",
"85.6087846399656", "50.3787072273931"), X2016 = c("22.8986911131366",
"85.7321179083769", "50.5504090357067")), row.names = c(166L,
332L, 498L), class = "data.frame")
Upvotes: 1
Views: 278
Reputation: 1015
Here is a solution using tidyr
and dplyr
(as well as ggplot2
):
library(ggplot2)
library(tidyr)
library(dplyr)
filterWage %>%
tidyr::pivot_longer(cols = starts_with("X"), names_to = "years", values_to = "value") %>%
dplyr::mutate(years = as.numeric(gsub("X", "", years)), value = as.numeric(value)) %>%
ggplot(aes(x = years, y = value, colour = Country.Code))+
geom_line()+
theme_minimal()
I can't test this because I don't have your data, but it should work.
The idea is that I turned all those columns into a single pair of columns, one storing the former names and one storing the values. This way, your data is in the long format, not the wide. ggplot
always likes taking data in the long format. Then, mutate()
turns this both columns into numeric variables, removing the "X" in the years.
Here is the output:
Upvotes: 1