Joe
Joe

Reputation: 1241

Line plot of data by grouping rows

I wish to plot a line plot of the df below by grouping the rows, so i would have 1 line for GDP, 1 line for agriculture and 1 line for services (ignoring countries for now), does anyone know if this is possible using ggplot?

My final plot would have an x axis of years and a y axis of gdp (value)

economics_df

Series Name              Country        1997        1998        1999        2000
GDP (current US$)        Spain   5.90077E+11 6.19215E+11 6.34908E+11 5.98363E+11
GDP (current US$)        France  1.45288E+12 1.50311E+12 1.49315E+12 1.36564E+12
GDP (current US$)        Monaco  2840175545  2934498443  2906093757  2647885849
GDP (current US$)        Italy   1.24188E+12 1.27005E+12 1.25245E+12 1.14668E+12
GDP (current US$)        Croatia 24091170703 25792876644 23677307509 21839780971
Agriculture  (% of GDP)  Spain   4.302210034 4.150411966 3.817378211 3.745305634
Agriculture (% of GDP)   France  2.344255815 2.362459834 2.236261411 2.098357551
Agriculture (% of GDP)   Monaco  2.544255815 2.342459834 2.234261411 2.108357551
Agriculture (% of GDP)   Italy   2.861911574 2.768857277 2.722232363  2.56361412
Agriculture (% of GDP)   Croatia 5.228986538 5.306173593 5.393085168 4.961600952
Services (% of GDP)      Syria   45.65197856 44.15290647 45.68986146 41.94697681
Services(% of GDP)       Lebanon 60.61030928 58.32727829 59.05884148 61.52190623
Services (% of GDP       Israel  62.02333939 63.02788655 63.92563162 64.72521236
Services (% of GDP)      Egypt   48.15193682 48.28789144 47.55581925 46.52599236
Services (% of GDP)      Libya   44.15193682 44.28789144 45.55581925 45.55581445

Upvotes: 0

Views: 40

Answers (1)

SamR
SamR

Reputation: 20260

You need to get the data into the right shape. ggplot makes plotting very easy once the data is in long form, which is easy to do with dplyr and tidyr:

library(dplyr)
library(ggplot2)
library(tidyr)

econ_for_plot  <- economics_df  |>
    pivot_longer(-c(`Series Name`, Country), names_to = "year")  |>
    group_by(`Series Name`, year)  |>
    summarise(value = sum(value))

econ_for_plot
# # A tibble: 12 x 3
# # Groups:   Series Name [3]   
#    `Series Name` year    value
#    <chr>         <chr>   <dbl>
#  1 Agriculture   1997  1.73e 1
#  2 Agriculture   1998  1.69e 1
#  3 Agriculture   1999  1.64e 1
#  4 Agriculture   2000  1.55e 1
#  5 GDP           1997  3.31e12
#  6 GDP           1998  3.42e12
#  7 GDP           1999  3.41e12
#  8 GDP           2000  3.14e12
#  9 Services      1997  2.61e 2
# 10 Services      1998  2.58e 2
# 11 Services      1999  2.62e 2
# 12 Services      2000  2.60e 2

I have used sum() in the summarise() call, but you could replace it with mean() or any other function to aggregate the data. Once it is in this form you can plot it:


ggplot(econ_for_plot, 
    aes(
        x = year,
        y = value,
        color = `Series Name`,
        group = `Series Name`
    )
) +
    geom_point() +
    geom_line() +
    scale_y_log10() +
    labs(
        title = "Sum of spending",
        y = "Sum of category (log scale)"
    ) +
    theme_bw()

enter image description here

Input data

economics_df <- structure(list(`Series Name` = c(
    "GDP", "GDP", "GDP", "GDP",
    "GDP", "Agriculture", "Agriculture", "Agriculture", "Agriculture",
    "Agriculture", "Services", "Services", "Services", "Services",
    "Services"
), Country = c(
    "Spain", "France", "Monaco", "Italy",
    "Croatia", "Spain", "France", "Monaco", "Italy", "Croatia", "Syria",
    "Lebanon", "Israel", "Egypt", "Libya"
), `1997` = c(
    5.90077e+11,
    1.45288e+12, 2840175545, 1.24188e+12, 24091170703, 4.302210034,
    2.344255815, 2.544255815, 2.861911574, 5.228986538, 45.65197856,
    60.61030928, 62.02333939, 48.15193682, 44.15193682
), `1998` = c(
    6.19215e+11,
    1.50311e+12, 2934498443, 1.27005e+12, 25792876644, 4.150411966,
    2.362459834, 2.342459834, 2.768857277, 5.306173593, 44.15290647,
    58.32727829, 63.02788655, 48.28789144, 44.28789144
), `1999` = c(
    6.34908e+11,
    1.49315e+12, 2906093757, 1.25245e+12, 23677307509, 3.817378211,
    2.236261411, 2.234261411, 2.722232363, 5.393085168, 45.68986146,
    59.05884148, 63.92563162, 47.55581925, 45.55581925
), `2000` = c(
    5.98363e+11,
    1.36564e+12, 2647885849, 1.14668e+12, 21839780971, 3.745305634,
    2.098357551, 2.108357551, 2.56361412, 4.961600952, 41.94697681,
    61.52190623, 64.72521236, 46.52599236, 45.55581445
)), class = "data.frame", row.names = c(
    NA,
    -15L
))

Edit: I made the Y-axis log-scale because the range of values was large. But now I have read the comments and looked at the data more closely, I realise that this plots absolute dollars and relative percent on the same scale. So this post tells you how to construct such a plot - although it does not really make sense to do so in this case.

Upvotes: 2

Related Questions