Aimen Sattar
Aimen Sattar

Reputation: 23

Adding rows and interpolating values

I have a dataframe with string data and values for Year, GDP and population. The series is every 5 years between 2010 and 2100 for each country.

Model Scenario Year population Country gdp SSP Version Date
1 OECD SSP2_v9_130325 2010 0.102 Aruba 0.00247 SSP2 v9 130325
2 OECD SSP2_v9_130325 2015 0.112 Aruba 0.00335 SSP2 v9 130325
3 OECD SSP2_v9_130325 2020 0.119 Aruba 0.00431 SSP2 v9 130325
4 OECD SSP2_v9_130325 2025 0.125 Aruba 0.00559 SSP2 v9 130325
5 OECD SSP2_v9_130325 2030 0.131 Aruba 0.00722 SSP2 v9 130325
6 OECD SSP2_v9_130325 2035 0.136 Aruba 0.00918 SSP2 v9 130325

I would like to calculate gdp and population annually by adding rows for each intermediate years and interpolating the GDP and population. I have grouped by country and used

complete(Year=full_seq(2010:2100,1)) %>% fill(Model, Scenario, Version, SSP, Date)

to create the years and fill for variables that remain the same. I get NA for GDP and population and cannot work out how to interpolate between existing values.

Country Year Model Scenario population gdp SSP Version Date
1 Afghanistan 2010 OECD SSP2_v9_130325 28.0 0.0159 SSP2 v9 130325
2 Afghanistan 2011 OECD SSP2_v9_130325 NA NA SSP2 v9 130325
3 Afghanistan 2012 OECD SSP2_v9_130325 NA NA SSP2 v9 130325
4 Afghanistan 2013 OECD SSP2_v9_130325 NA NA SSP2 v9 130325
5 Afghanistan 2014 OECD SSP2_v9_130325 NA NA SSP2 v9 130325
6 Afghanistan 2015 OECD SSP2_v9_130325 31.8 0.0206 SSP2 v9 130325

I have tried approx and approxm, but was not able to make them work. Thanks

Upvotes: 1

Views: 227

Answers (1)

Ben
Ben

Reputation: 30474

You could replace NA with interpolated values using na.approx from zoo package.

library(tidyverse)
library(zoo)

df %>%
  group_by(Country) %>%
  complete(Year = full_seq(2010:2100, 1)) %>% 
  fill(Model, Scenario, Version, SSP, Date) %>%
  mutate_at(vars(population, gdp), na.approx, rule = 2)

Country  Year Model Scenario       population     gdp SSP   Version   Date
   <chr>   <dbl> <chr> <chr>               <dbl>   <dbl> <chr> <chr>    <int>
 1 Aruba    2010 OECD  SSP2_v9_130325      0.102 0.00247 SSP2  v9      130325
 2 Aruba    2011 OECD  SSP2_v9_130325      0.104 0.00265 SSP2  v9      130325
 3 Aruba    2012 OECD  SSP2_v9_130325      0.106 0.00282 SSP2  v9      130325
 4 Aruba    2013 OECD  SSP2_v9_130325      0.108 0.00300 SSP2  v9      130325
 5 Aruba    2014 OECD  SSP2_v9_130325      0.11  0.00317 SSP2  v9      130325
 6 Aruba    2015 OECD  SSP2_v9_130325      0.112 0.00335 SSP2  v9      130325
 7 Aruba    2016 OECD  SSP2_v9_130325      0.113 0.00354 SSP2  v9      130325
 8 Aruba    2017 OECD  SSP2_v9_130325      0.115 0.00373 SSP2  v9      130325
 9 Aruba    2018 OECD  SSP2_v9_130325      0.116 0.00393 SSP2  v9      130325
10 Aruba    2019 OECD  SSP2_v9_130325      0.118 0.00412 SSP2  v9      130325
# … with 81 more rows

I assumed you wanted to group_by Country. Also, this includes rule = 2 to extrapolate outside of the range of time points (your sample data only goes to year 2035). The extrapolated values beyond 2035 will be constant based on the most recent value.

If you wish to explore alternatives to extrapolation in addition to interpolation, you could try using na.spline also from zoo:

df %>%
  group_by(Country) %>%
  complete(Year = full_seq(2010:2100,1)) %>% 
  fill(Model, Scenario, Version, SSP, Date) %>%
  mutate_at(vars(population, gdp), na.spline, method = "natural")

Here you can see a difference in extrapolated values beyond 2035:

...
21 Aruba    2030 OECD  SSP2_v9_130325      0.131 0.00722 SSP2  v9      130325
22 Aruba    2031 OECD  SSP2_v9_130325      0.132 0.00759 SSP2  v9      130325
23 Aruba    2032 OECD  SSP2_v9_130325      0.133 0.00798 SSP2  v9      130325
24 Aruba    2033 OECD  SSP2_v9_130325      0.134 0.00837 SSP2  v9      130325
25 Aruba    2034 OECD  SSP2_v9_130325      0.135 0.00877 SSP2  v9      130325
26 Aruba    2035 OECD  SSP2_v9_130325      0.136 0.00918 SSP2  v9      130325
27 Aruba    2036 OECD  SSP2_v9_130325      0.137 0.00959 SSP2  v9      130325
28 Aruba    2037 OECD  SSP2_v9_130325      0.138 0.00999 SSP2  v9      130325
29 Aruba    2038 OECD  SSP2_v9_130325      0.139 0.0104  SSP2  v9      130325
30 Aruba    2039 OECD  SSP2_v9_130325      0.140 0.0108  SSP2  v9      130325
31 Aruba    2040 OECD  SSP2_v9_130325      0.141 0.0112  SSP2  v9      130325
32 Aruba    2041 OECD  SSP2_v9_130325      0.142 0.0116  SSP2  v9      130325
33 Aruba    2042 OECD  SSP2_v9_130325      0.143 0.0120  SSP2  v9      130325
34 Aruba    2043 OECD  SSP2_v9_130325      0.144 0.0124  SSP2  v9      130325
35 Aruba    2044 OECD  SSP2_v9_130325      0.145 0.0128  SSP2  v9      130325
...

Data

df <- structure(list(Model = c("OECD", "OECD", "OECD", "OECD", "OECD", 
"OECD"), Scenario = c("SSP2_v9_130325", "SSP2_v9_130325", "SSP2_v9_130325", 
"SSP2_v9_130325", "SSP2_v9_130325", "SSP2_v9_130325"), Year = c(2010L, 
2015L, 2020L, 2025L, 2030L, 2035L), population = c(0.102, 0.112, 
0.119, 0.125, 0.131, 0.136), Country = c("Aruba", "Aruba", "Aruba", 
"Aruba", "Aruba", "Aruba"), gdp = c(0.00247, 0.00335, 0.00431, 
0.00559, 0.00722, 0.00918), SSP = c("SSP2", "SSP2", "SSP2", "SSP2", 
"SSP2", "SSP2"), Version = c("v9", "v9", "v9", "v9", "v9", "v9"
), Date = c(130325L, 130325L, 130325L, 130325L, 130325L, 130325L
)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6"))

Upvotes: 0

Related Questions