constantlyFlagged
constantlyFlagged

Reputation: 398

Plot multiple scatter graph using certain range of columns as x axis

I have a data frame as shown below:

DataFrame

This data is saved as the variable "filterWage".

This data frame contains the columns Country.Code, Series.Code and range of columns from X1992 to X2016 (please bare in mind I could not fit the whole X1992 to X2016 columns thus only up to x2003 is shown on the image).

The objective is to plot this range of columns from X1992 to X2016 as the x-axis and the value of these columns as the y-axis, for all three Country.Code in the same plot using ggplot.

Desired outcome: ( please note that the image is merely a rough sketch and the values are indeed meaningless) Desired outcome - please ignore the incorrect values

This is the output using dput:

dput(filterWage)

structure(list(Country.Code = c("LIC", "HIC", "MIC"), Series.Code = c("SL.EMP.WORK.ZS", 
"SL.EMP.WORK.ZS", "SL.EMP.WORK.ZS"), X1991 = c("20.9370976972316", 
"81.0876932275574", "35.5281394063616"), X1992 = c("20.5114136551512", 
"81.1351300966788", "36.1635880437505"), X1993 = c("20.309137441086", 
"81.2165339365649", "37.1943086793304"), X1994 = c("20.5295488411938", 
"81.3404039783739", "37.8383615292357"), X1995 = c("20.6817100202905", 
"81.6237989883691", "38.6979499878051"), X1996 = c("20.6371916830899", 
"81.8361588628956", "39.5068057398044"), X1997 = c("20.286823787263", 
"82.140587079514", "40.0301927962263"), X1998 = c("20.3800244386649", 
"82.4387485706644", "40.1689926776"), X1999 = c("20.764112251619", 
"82.7303105606365", "40.3738643748966"), X2000 = c("20.5693165666214", 
"83.0691410634413", "40.7860042844162"), X2001 = c("20.6682554227926", 
"83.204549665691", "40.192062080076"), X2002 = c("20.8364224185492", 
"83.3236267668205", "40.5335866623684"), X2003 = c("20.9073131339766", 
"83.3872571313811", "41.139037517746"), X2004 = c("20.9741288400519", 
"83.4445860257721", "42.2303006080139"), X2005 = c("20.6931847813705", 
"83.7017144881631", "43.2626386469723"), X2006 = c("21.0482961178193", 
"84.0126990344844", "44.4032188240263"), X2007 = c("21.3789126998501", 
"84.3099847840774", "45.3836159214118"), X2008 = c("21.713214795025", 
"84.5962197639565", "46.1155674823931"), X2009 = c("21.9697284827288", 
"84.5498700141843", "46.8058440395641"), X2010 = c("22.3676584297642", 
"84.614095791104", "47.6604416403023"), X2011 = c("22.383629219082", 
"84.8323447185694", "48.6708213003224"), X2012 = c("22.6398140927035", 
"85.1570293953982", "49.2830314898562"), X2013 = c("23.0490884430663", 
"85.3153737253528", "49.5549460027067"), X2014 = c("22.8973838689315", 
"85.4292150603637", "50.0215575751258"), X2015 = c("22.9079191238809", 
"85.6087846399656", "50.3787072273931"), X2016 = c("22.8986911131366", 
"85.7321179083769", "50.5504090357067")), row.names = c(166L, 
332L, 498L), class = "data.frame")

Upvotes: 1

Views: 278

Answers (1)

Érico Patto
Érico Patto

Reputation: 1015

Here is a solution using tidyr and dplyr (as well as ggplot2):

library(ggplot2)
library(tidyr)
library(dplyr)

filterWage %>%
  tidyr::pivot_longer(cols = starts_with("X"), names_to = "years", values_to = "value") %>%
  dplyr::mutate(years = as.numeric(gsub("X", "", years)), value = as.numeric(value)) %>%
  ggplot(aes(x = years, y = value, colour = Country.Code))+
  geom_line()+
  theme_minimal()

I can't test this because I don't have your data, but it should work.

The idea is that I turned all those columns into a single pair of columns, one storing the former names and one storing the values. This way, your data is in the long format, not the wide. ggplot always likes taking data in the long format. Then, mutate() turns this both columns into numeric variables, removing the "X" in the years.

Here is the output:

enter image description here

Upvotes: 1

Related Questions