Reputation: 538
I have a data.table()
with the columns : city,month, year and gdp_per_capta - by year, its value repeat all months. (actually I have 30 more columns)
My base is full fill from 2005 to 2018. But, to the year 2019, the gdp_per_capta has NA.
I would like to put a value on it, this way: 2019 = (2018/2017) * 2018
How can I do it?
PS: How can I improve the title of this question?
Example of my data:
structure(list(city = c(110001L, 110001L, 110001L, 110001L,
110001L, 110001L, 110001L, 110001L, 110001L, 110001L, 110001L,
110001L, 110037L, 110037L, 110037L, 110037L, 110037L, 110037L,
110037L, 110037L, 110037L, 110037L, 110037L, 110037L, 110040L,
110040L, 110040L, 110040L, 110040L, 110040L, 110040L, 110040L,
110040L, 110040L, 110040L, 110040L), month = c("01", "04", "07",
"10", "02", "05", "08", "11", "03", "06", "09", "12", "01", "04",
"07", "10", "02", "05", "08", "11", "03", "06", "09", "12", "01",
"04", "07", "10", "02", "05", "08", "11", "03", "06", "09", "12"
), year = c(2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2018L,
2019L, 2019L, 2019L, 2019L, 2017L, 2017L, 2017L, 2017L, 2018L,
2018L, 2018L, 2018L, 2019L, 2019L, 2019L, 2019L, 2017L, 2017L,
2017L, 2017L, 2018L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L,
2019L), gdp_per_capta = c(19081.43, 19081.43, 19081.43,
19081.43, 21552.47, 21552.47, 21552.47, 21552.47, NA, NA, NA,
NA, 23353.6, 23353.6, 23353.6, 23353.6, 21053.93, 21053.93, 21053.93,
21053.93, NA, NA, NA, NA, 14699.7, 14699.7, 14699.7, 14699.7,
15655.57, 15655.57, 15655.57, 15655.57, NA, NA, NA, NA)), row.names = c(NA,
-36L), class = c("data.table", "data.frame"))
Upvotes: 0
Views: 131
Reputation: 42544
As the question is tagged with data.table
, Here are two data.table
approaches:
fifelse()
DT[, gdp_per_capta := fifelse(is.na(gdp_per_capta) & year == 2019L,
last(gdp_per_capta[year == 2018L])^2 / last(gdp_per_capta[year == 2017L]),
gdp_per_capta), by = city][]
fcoalesce()
Here, a table containing the replacement values is being joined:
rDT <- DT[, .(year = 2019L, last(gdp_per_capta[year == 2018L])^2 / last(gdp_per_capta[year == 2017L])),
by = city]
DT[rDT, on = .(city, year), gdp_per_capta := fcoalesce(x.gdp_per_capta, i.V2)][]
The replacement table rDT
is
rDT
city year V2 1: 110001 2019 24343.51 2: 110037 2019 18980.71 3: 110040 2019 16673.60
Upvotes: 2
Reputation: 160437
I'll assume that month
is ordinal and not numeric. Because my solution pivots, I'll temporarily move month
out of the way in lieu of integers.
# store the real month/month_i translation, per-city/year
DT0 <- DT[, .(city, month, year, month_i)]
# pivot wide, calculate, then melt-long
DTnew <- DT[, month := NULL] %>%
dcast(., city + month_i ~ year, value.var = "gdp_per_capta") %>%
.[, `2019` := (`2018`/`2017`)*`2018`] %>%
melt(id.vars = c("city", "month_i"), variable.name = "year", value.name = "gdp_per_capta",
variable.factor = FALSE) %>%
.[, year := as.integer(year) ]
DTnew
# city month_i year gdp_per_capta
# <int> <int> <int> <num>
# 1: 110001 1 2017 19081.43
# 2: 110001 2 2017 19081.43
# 3: 110001 3 2017 19081.43
# 4: 110001 4 2017 19081.43
# 5: 110037 1 2017 23353.60
# ---
# 32: 110037 4 2019 18980.71
# 33: 110040 1 2019 16673.60
# 34: 110040 2 2019 16673.60
# 35: 110040 3 2019 16673.60
# 36: 110040 4 2019 16673.60
# rejoin to bring back the original months
DTnew <- DT0[DTnew, on = .(city, month_i, year)][, month_i := NULL]
DTnew
# city month year gdp_per_capta
# <int> <char> <int> <num>
# 1: 110001 01 2017 19081.43
# 2: 110001 04 2017 19081.43
# 3: 110001 07 2017 19081.43
# 4: 110001 10 2017 19081.43
# 5: 110037 01 2017 23353.60
# ---
# 32: 110037 12 2019 18980.71
# 33: 110040 03 2019 16673.60
# 34: 110040 06 2019 16673.60
# 35: 110040 09 2019 16673.60
# 36: 110040 12 2019 16673.60
Upvotes: 2
Reputation: 4487
Here is a solution! For data that is.na
and year == 2019
take the first value of 2018 & first value of 2017 to calculate as your formula.
library(dplyr)
df %>%
group_by(city) %>%
mutate(gdp_per_capta_revised = if_else(is.na(gdp_per_capta) & year == 2019,
(first(gdp_per_capta[year == 2018]) /
first(gdp_per_capta[year == 2017])) *
first(gdp_per_capta[year == 2018]),
gdp_per_capta))
Here is the brief look at the result for those NA
in 2019 - The other year will have same value as original gdp_per_capta
.
city month year gdp_per_capta gdp_per_capta_revised
<int> <chr> <int> <dbl> <dbl>
1 110001 03 2019 NA 24344.
2 110001 06 2019 NA 24344.
3 110001 09 2019 NA 24344.
4 110001 12 2019 NA 24344.
5 110037 03 2019 NA 18981.
6 110037 06 2019 NA 18981.
7 110037 09 2019 NA 18981.
8 110037 12 2019 NA 18981.
9 110040 03 2019 NA 16674.
10 110040 06 2019 NA 16674.
11 110040 09 2019 NA 16674.
12 110040 12 2019 NA 16674.
Upvotes: 1