Reputation: 23
I have a dataframe with string data and values for Year, GDP and population. The series is every 5 years between 2010 and 2100 for each country.
Model | Scenario | Year | population | Country | gdp | SSP | Version | Date | |
---|---|---|---|---|---|---|---|---|---|
1 | OECD | SSP2_v9_130325 | 2010 | 0.102 | Aruba | 0.00247 | SSP2 | v9 | 130325 |
2 | OECD | SSP2_v9_130325 | 2015 | 0.112 | Aruba | 0.00335 | SSP2 | v9 | 130325 |
3 | OECD | SSP2_v9_130325 | 2020 | 0.119 | Aruba | 0.00431 | SSP2 | v9 | 130325 |
4 | OECD | SSP2_v9_130325 | 2025 | 0.125 | Aruba | 0.00559 | SSP2 | v9 | 130325 |
5 | OECD | SSP2_v9_130325 | 2030 | 0.131 | Aruba | 0.00722 | SSP2 | v9 | 130325 |
6 | OECD | SSP2_v9_130325 | 2035 | 0.136 | Aruba | 0.00918 | SSP2 | v9 | 130325 |
I would like to calculate gdp and population annually by adding rows for each intermediate years and interpolating the GDP and population. I have grouped by country and used
complete(Year=full_seq(2010:2100,1)) %>% fill(Model, Scenario, Version, SSP, Date)
to create the years and fill for variables that remain the same. I get NA for GDP and population and cannot work out how to interpolate between existing values.
Country | Year | Model | Scenario | population | gdp | SSP | Version | Date | |
---|---|---|---|---|---|---|---|---|---|
1 | Afghanistan | 2010 | OECD | SSP2_v9_130325 | 28.0 | 0.0159 | SSP2 | v9 | 130325 |
2 | Afghanistan | 2011 | OECD | SSP2_v9_130325 | NA | NA | SSP2 | v9 | 130325 |
3 | Afghanistan | 2012 | OECD | SSP2_v9_130325 | NA | NA | SSP2 | v9 | 130325 |
4 | Afghanistan | 2013 | OECD | SSP2_v9_130325 | NA | NA | SSP2 | v9 | 130325 |
5 | Afghanistan | 2014 | OECD | SSP2_v9_130325 | NA | NA | SSP2 | v9 | 130325 |
6 | Afghanistan | 2015 | OECD | SSP2_v9_130325 | 31.8 | 0.0206 | SSP2 | v9 | 130325 |
I have tried approx and approxm, but was not able to make them work. Thanks
Upvotes: 1
Views: 227
Reputation: 30474
You could replace NA
with interpolated values using na.approx
from zoo
package.
library(tidyverse)
library(zoo)
df %>%
group_by(Country) %>%
complete(Year = full_seq(2010:2100, 1)) %>%
fill(Model, Scenario, Version, SSP, Date) %>%
mutate_at(vars(population, gdp), na.approx, rule = 2)
Country Year Model Scenario population gdp SSP Version Date
<chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <chr> <int>
1 Aruba 2010 OECD SSP2_v9_130325 0.102 0.00247 SSP2 v9 130325
2 Aruba 2011 OECD SSP2_v9_130325 0.104 0.00265 SSP2 v9 130325
3 Aruba 2012 OECD SSP2_v9_130325 0.106 0.00282 SSP2 v9 130325
4 Aruba 2013 OECD SSP2_v9_130325 0.108 0.00300 SSP2 v9 130325
5 Aruba 2014 OECD SSP2_v9_130325 0.11 0.00317 SSP2 v9 130325
6 Aruba 2015 OECD SSP2_v9_130325 0.112 0.00335 SSP2 v9 130325
7 Aruba 2016 OECD SSP2_v9_130325 0.113 0.00354 SSP2 v9 130325
8 Aruba 2017 OECD SSP2_v9_130325 0.115 0.00373 SSP2 v9 130325
9 Aruba 2018 OECD SSP2_v9_130325 0.116 0.00393 SSP2 v9 130325
10 Aruba 2019 OECD SSP2_v9_130325 0.118 0.00412 SSP2 v9 130325
# … with 81 more rows
I assumed you wanted to group_by
Country
. Also, this includes rule = 2
to extrapolate outside of the range of time points (your sample data only goes to year 2035). The extrapolated values beyond 2035 will be constant based on the most recent value.
If you wish to explore alternatives to extrapolation in addition to interpolation, you could try using na.spline
also from zoo
:
df %>%
group_by(Country) %>%
complete(Year = full_seq(2010:2100,1)) %>%
fill(Model, Scenario, Version, SSP, Date) %>%
mutate_at(vars(population, gdp), na.spline, method = "natural")
Here you can see a difference in extrapolated values beyond 2035:
...
21 Aruba 2030 OECD SSP2_v9_130325 0.131 0.00722 SSP2 v9 130325
22 Aruba 2031 OECD SSP2_v9_130325 0.132 0.00759 SSP2 v9 130325
23 Aruba 2032 OECD SSP2_v9_130325 0.133 0.00798 SSP2 v9 130325
24 Aruba 2033 OECD SSP2_v9_130325 0.134 0.00837 SSP2 v9 130325
25 Aruba 2034 OECD SSP2_v9_130325 0.135 0.00877 SSP2 v9 130325
26 Aruba 2035 OECD SSP2_v9_130325 0.136 0.00918 SSP2 v9 130325
27 Aruba 2036 OECD SSP2_v9_130325 0.137 0.00959 SSP2 v9 130325
28 Aruba 2037 OECD SSP2_v9_130325 0.138 0.00999 SSP2 v9 130325
29 Aruba 2038 OECD SSP2_v9_130325 0.139 0.0104 SSP2 v9 130325
30 Aruba 2039 OECD SSP2_v9_130325 0.140 0.0108 SSP2 v9 130325
31 Aruba 2040 OECD SSP2_v9_130325 0.141 0.0112 SSP2 v9 130325
32 Aruba 2041 OECD SSP2_v9_130325 0.142 0.0116 SSP2 v9 130325
33 Aruba 2042 OECD SSP2_v9_130325 0.143 0.0120 SSP2 v9 130325
34 Aruba 2043 OECD SSP2_v9_130325 0.144 0.0124 SSP2 v9 130325
35 Aruba 2044 OECD SSP2_v9_130325 0.145 0.0128 SSP2 v9 130325
...
Data
df <- structure(list(Model = c("OECD", "OECD", "OECD", "OECD", "OECD",
"OECD"), Scenario = c("SSP2_v9_130325", "SSP2_v9_130325", "SSP2_v9_130325",
"SSP2_v9_130325", "SSP2_v9_130325", "SSP2_v9_130325"), Year = c(2010L,
2015L, 2020L, 2025L, 2030L, 2035L), population = c(0.102, 0.112,
0.119, 0.125, 0.131, 0.136), Country = c("Aruba", "Aruba", "Aruba",
"Aruba", "Aruba", "Aruba"), gdp = c(0.00247, 0.00335, 0.00431,
0.00559, 0.00722, 0.00918), SSP = c("SSP2", "SSP2", "SSP2", "SSP2",
"SSP2", "SSP2"), Version = c("v9", "v9", "v9", "v9", "v9", "v9"
), Date = c(130325L, 130325L, 130325L, 130325L, 130325L, 130325L
)), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6"))
Upvotes: 0