RxT
RxT

Reputation: 538

How can I complete a data.table column with values from this column?

I have a data.table() with the columns : city,month, year and gdp_per_capta - by year, its value repeat all months. (actually I have 30 more columns)

My base is full fill from 2005 to 2018. But, to the year 2019, the gdp_per_capta has NA.

I would like to put a value on it, this way: 2019 = (2018/2017) * 2018

How can I do it?

PS: How can I improve the title of this question?

Example of my data:

structure(list(city = c(110001L, 110001L, 110001L, 110001L, 
                                      110001L, 110001L, 110001L, 110001L, 110001L, 110001L, 110001L, 
                                      110001L, 110037L, 110037L, 110037L, 110037L, 110037L, 110037L, 
                                      110037L, 110037L, 110037L, 110037L, 110037L, 110037L, 110040L, 
                                      110040L, 110040L, 110040L, 110040L, 110040L, 110040L, 110040L, 
                                      110040L, 110040L, 110040L, 110040L), month = c("01", "04", "07", 
                                                                                   "10", "02", "05", "08", "11", "03", "06", "09", "12", "01", "04", 
                                                                                   "07", "10", "02", "05", "08", "11", "03", "06", "09", "12", "01", 
                                                                                   "04", "07", "10", "02", "05", "08", "11", "03", "06", "09", "12"
                                      ), year = c(2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2018L, 
                                                 2019L, 2019L, 2019L, 2019L, 2017L, 2017L, 2017L, 2017L, 2018L, 
                                                 2018L, 2018L, 2018L, 2019L, 2019L, 2019L, 2019L, 2017L, 2017L, 
                                                 2017L, 2017L, 2018L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L, 
                                                 2019L), gdp_per_capta = c(19081.43, 19081.43, 19081.43, 
                                                                                  19081.43, 21552.47, 21552.47, 21552.47, 21552.47, NA, NA, NA, 
                                                                                  NA, 23353.6, 23353.6, 23353.6, 23353.6, 21053.93, 21053.93, 21053.93, 
                                                                                  21053.93, NA, NA, NA, NA, 14699.7, 14699.7, 14699.7, 14699.7, 
                                                                                  15655.57, 15655.57, 15655.57, 15655.57, NA, NA, NA, NA)), row.names = c(NA, 
                                                                                                                                                          -36L), class = c("data.table", "data.frame"))

My expected result is in red: enter image description here

Upvotes: 0

Views: 131

Answers (3)

Uwe
Uwe

Reputation: 42544

As the question is tagged with data.table, Here are two data.table approaches:

1. fifelse()

DT[, gdp_per_capta := fifelse(is.na(gdp_per_capta) & year == 2019L, 
                              last(gdp_per_capta[year == 2018L])^2 / last(gdp_per_capta[year == 2017L]),
                              gdp_per_capta), by = city][]

2. join & fcoalesce()

Here, a table containing the replacement values is being joined:

rDT <- DT[, .(year = 2019L, last(gdp_per_capta[year == 2018L])^2 / last(gdp_per_capta[year == 2017L])), 
          by = city]
DT[rDT, on = .(city, year), gdp_per_capta := fcoalesce(x.gdp_per_capta, i.V2)][]

The replacement table rDT is

rDT
     city year       V2
1: 110001 2019 24343.51
2: 110037 2019 18980.71
3: 110040 2019 16673.60

Upvotes: 2

r2evans
r2evans

Reputation: 160437

I'll assume that month is ordinal and not numeric. Because my solution pivots, I'll temporarily move month out of the way in lieu of integers.

# store the real month/month_i translation, per-city/year
DT0 <- DT[, .(city, month, year, month_i)]

# pivot wide, calculate, then melt-long
DTnew <- DT[, month := NULL] %>%
  dcast(., city + month_i ~ year, value.var = "gdp_per_capta") %>%
  .[, `2019` := (`2018`/`2017`)*`2018`] %>%
  melt(id.vars = c("city", "month_i"), variable.name = "year", value.name = "gdp_per_capta",
       variable.factor = FALSE) %>%
  .[, year := as.integer(year) ]
DTnew
#       city month_i  year gdp_per_capta
#      <int>   <int> <int>         <num>
#  1: 110001       1  2017      19081.43
#  2: 110001       2  2017      19081.43
#  3: 110001       3  2017      19081.43
#  4: 110001       4  2017      19081.43
#  5: 110037       1  2017      23353.60
# ---                                   
# 32: 110037       4  2019      18980.71
# 33: 110040       1  2019      16673.60
# 34: 110040       2  2019      16673.60
# 35: 110040       3  2019      16673.60
# 36: 110040       4  2019      16673.60

# rejoin to bring back the original months
DTnew <- DT0[DTnew, on = .(city, month_i, year)][, month_i := NULL]
DTnew
#       city  month  year gdp_per_capta
#      <int> <char> <int>         <num>
#  1: 110001     01  2017      19081.43
#  2: 110001     04  2017      19081.43
#  3: 110001     07  2017      19081.43
#  4: 110001     10  2017      19081.43
#  5: 110037     01  2017      23353.60
# ---                                  
# 32: 110037     12  2019      18980.71
# 33: 110040     03  2019      16673.60
# 34: 110040     06  2019      16673.60
# 35: 110040     09  2019      16673.60
# 36: 110040     12  2019      16673.60

Upvotes: 2

Sinh Nguyen
Sinh Nguyen

Reputation: 4487

Here is a solution! For data that is.na and year == 2019 take the first value of 2018 & first value of 2017 to calculate as your formula.

library(dplyr)

df %>% 
  group_by(city) %>%
  mutate(gdp_per_capta_revised = if_else(is.na(gdp_per_capta) & year == 2019,
                                 (first(gdp_per_capta[year == 2018]) / 
                                    first(gdp_per_capta[year == 2017])) *
                                   first(gdp_per_capta[year == 2018]),
                                  gdp_per_capta))

Here is the brief look at the result for those NA in 2019 - The other year will have same value as original gdp_per_capta.

     city month  year gdp_per_capta gdp_per_capta_revised
    <int> <chr> <int>         <dbl>                 <dbl>
 1 110001 03     2019            NA                24344.
 2 110001 06     2019            NA                24344.
 3 110001 09     2019            NA                24344.
 4 110001 12     2019            NA                24344.
 5 110037 03     2019            NA                18981.
 6 110037 06     2019            NA                18981.
 7 110037 09     2019            NA                18981.
 8 110037 12     2019            NA                18981.
 9 110040 03     2019            NA                16674.
10 110040 06     2019            NA                16674.
11 110040 09     2019            NA                16674.
12 110040 12     2019            NA                16674.

Upvotes: 1

Related Questions