Create new ratio indicator in long data

Question

I have a long data frame

mydf <- data.frame(
+     date=c("2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01", "2016-02-01", "2016-03-01", "2016-04-01", "2016-05-01", "2016-06-01"),
+     value=c(1,2,3,4,5,1,2,3,4,5),
+     country=c("US", "US", "US", "US", "US", "US", "US", "US", "US", "US"),
+     indicator=c("gdp", "gdp", "gdp", "gdp", "gdp", "population", "population", "population", "population", "population"))

         date value country  indicator
1  2016-01-01     1      US        gdp
2  2016-02-01     2      US        gdp
3  2016-03-01     3      US        gdp
4  2016-04-01     4      US        gdp
5  2016-05-01     5      US        gdp
6  2016-02-01     1      US population
7  2016-03-01     2      US population
8  2016-04-01     3      US population
9  2016-05-01     4      US population
10 2016-06-01     5      US population

I want to create specific new indicators that come from ratios, e.g. GDP/population*1000

It would look something like this, it has to match the right dates for each respective indicator

mydf <- data.frame(
+     date=c("2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01", "2016-02-01", "2016-03-01", "2016-04-01", "2016-05-01", "2016-06-01", "2016-02-01", "2016-03-01", "2016-04-01", "2016-05-01"),
+     value=c(1,2,3,4,5,1,2,3,4,5,2,1.5,1.33,1.2),
+     country=c("US", "US", "US", "US", "US", "US", "US", "US", "US", "US", "US", "US", "US", "US"),
+     indicator=c("gdp", "gdp", "gdp", "gdp", "gdp", "population", "population", "population", "population", "population", "gdp per capita", "gdp per capita", "gdp per capita", "gdp per capita"))

         date value country      indicator
1  2016-01-01  1.00      US            gdp
2  2016-02-01  2.00      US            gdp
3  2016-03-01  3.00      US            gdp
4  2016-04-01  4.00      US            gdp
5  2016-05-01  5.00      US            gdp
6  2016-02-01  1.00      US     population
7  2016-03-01  2.00      US     population
8  2016-04-01  3.00      US     population
9  2016-05-01  4.00      US     population
10 2016-06-01  5.00      US     population
11 2016-02-01  2.00      US gdp per capita
12 2016-03-01  1.50      US gdp per capita
13 2016-04-01  1.33      US gdp per capita
14 2016-05-01  1.20      US gdp per capita

Is there an easy way to do this in R?

Kevin Arseneau · Accepted Answer

Yes, I think it is easier to make the changes you want with a tidy approach using tidyr and dplyr.

library(dplyr)
library(tidyr)

df <- tribble(
         ~date, ~value, ~country,   ~indicator,
  "2016-01-01",      1,     "US",        "gdp",
  "2016-02-01",      2,     "US",        "gdp",
  "2016-03-01",      3,     "AU",        "gdp",
  "2016-04-01",      4,     "US",        "gdp",
  "2016-05-01",      5,     "US",        "gdp",
  "2016-02-01",      1,     "US", "population",
  "2016-03-01",      2,     "AU", "population",
  "2016-04-01",      3,     "US", "population",
  "2016-05-01",      4,     "US", "population",
  "2016-06-01",      5,     "US", "population"
)

df %>%
  group_by(country) %>%
  spread(indicator, value) %>%
  mutate(`gdp per capita` = gdp / population) %>%
  gather(indicator, value, -c(date, country)) %>%
  drop_na(value)

# # A tibble: 14 x 4
# # Groups:   country [2]
#          date country      indicator    value
#                          
#  1 2016-01-01      US            gdp 1.000000
#  2 2016-02-01      US            gdp 2.000000
#  3 2016-03-01      AU            gdp 3.000000
#  4 2016-04-01      US            gdp 4.000000
#  5 2016-05-01      US            gdp 5.000000
#  6 2016-02-01      US     population 1.000000
#  7 2016-03-01      AU     population 2.000000
#  8 2016-04-01      US     population 3.000000
#  9 2016-05-01      US     population 4.000000
# 10 2016-06-01      US     population 5.000000
# 11 2016-02-01      US gdp per capita 2.000000
# 12 2016-03-01      AU gdp per capita 1.500000
# 13 2016-04-01      US gdp per capita 1.333333
# 14 2016-05-01      US gdp per capita 1.250000

N.B. I've modified the data and added a group_by statement to demonstrate the solution with multiple values for country.

Create new ratio indicator in long data

Answers (2)

Related Questions