minimouse
minimouse

Reputation: 171

R Tidyverse: Carry forward values for non existing variables

I do have a problem, that I can't seem to solve efficiently.

Say the result of my loop is the following (which is made up).

library(tidyverse)
  mytib <- tribble(
  ~year,  ~month, ~shop_id, ~inventory,
  2019, 01, "A", 200,
  2019, 01, "B", 300,
  2019, 01, "C", 240,
  2019, 05, "A", 250,
  2019, 05, "B", 400,
  2019, 05, "D", 400,
  2019, 08, "A", 300,
  2019, 08, "B", 250,
  2019, 08, "C", 200,
  2019, 11, "A", 300,
  2019, 11, "E", 250,)

That is, I only get results for January, May, August and November (because data is only available for these dates). However, I would like to carry values forward for February, March, June(those would get the January values. June and July will get the values from May and so on. I can't use "fill", since R doesn't "know" that there is February, March...etc (and I cannot "mutate" these months into "mytib".)

I came up with a solution which unfortunately is highly inefficient and prone to errors. The solution (in words) is the following: Create a grid with months from 1:12, and all unique shop_ids (see code below), then create separate vectors with the shops I observe in January, May, August and November.

Then group_split the grid (into a list) by month and year. Then reduce all unique shopids to those I observe. Join them alltogher (using dplyr::join_all), then left-join the original tibble to the reduced grid, and finally carry forward (.direction="down") observed values, so that I end up with a tibbe "result", which is what I want.

Although I reach my desired goal, I wonder whether there is a solution that is much more efficient and less prone to errors than my bumbling approach. Any help or hint is highly appreciated.

Ps. Please don't be too harsh to me, as I'm still pretty new to R.

Here is the complete code:

mytib <- tribble(
  ~year,  ~month, ~shop_id, ~inventory,
  2019, 01, "A", 200,
  2019, 01, "B", 300,
  2019, 01, "C", 240,
  2019, 05, "A", 250,
  2019, 05, "B", 400,
  2019, 05, "D", 400,
  2019, 08, "A", 300,
  2019, 08, "B", 250,
  2019, 08, "C", 200,
  2019, 11, "A", 300,
  2019, 11, "E", 250,)


grid <- expand.grid(year = 2019, 
                    month = 1:12, 
                   shop_id = unique(mytib$shop_id))


grid

jan2019 <- mytib %>% filter(year == 2019 & month ==01)
jan2019 <- jan2019$shop_id

may2019 <- mytib %>% filter(year == 2019 & month == 05)
may2019 <- may2019$shop_id

aug2019 <- mytib %>% filter(year == 2019 & month == 08)
aug2019  <-aug2019$shop_id

nov2019 <- mytib %>% filter(year == 2019 & month == 11)
nov2019 <- nov2019$shop_id


my_list <- grid %>% group_by(year, month) %>% group_split()
my_list



my_list[[1]] <- my_list[[1]][my_list[[1]]$shop_id %in% jan2019,] ; my_list[[1]]
my_list[[2]] <- my_list[[2]][my_list[[2]]$shop_id %in% jan2019,] ; my_list[[2]]
my_list[[3]] <- my_list[[3]][my_list[[3]]$shop_id %in% jan2019,] ; my_list[[3]]
my_list[[4]] <- my_list[[4]][my_list[[4]]$shop_id %in% jan2019,] ; my_list[[4]]
my_list[[5]] <- my_list[[5]][my_list[[5]]$shop_id %in% may2019,] ; my_list[[5]]
my_list[[6]] <- my_list[[6]][my_list[[6]]$shop_id %in% may2019,] ; my_list[[6]]
my_list[[7]] <- my_list[[7]][my_list[[7]]$shop_id %in% may2019,] ; my_list[[7]]
my_list[[8]] <- my_list[[8]][my_list[[8]]$shop_id %in% aug2019,] ; my_list[[8]]
my_list[[9]] <- my_list[[9]][my_list[[9]]$shop_id %in% aug2019,] ; my_list[[9]]
my_list[[10]]<- my_list[[10]][my_list[[10]]$shop_id %in% aug2019,];my_list[[10]]
my_list[[11]]<- my_list[[11]][my_list[[11]]$shop_id %in% nov2019,];my_list[[11]]
my_list[[12]]<- my_list[[12]][my_list[[12]]$shop_id %in% nov2019,];my_list[[12]]

result <- plyr::join_all(my_list, type="full")
result
result <- left_join(result, mytib, by=c("year", "month", "shop_id"))
result %>% group_by(shop_id) %>% fill(inventory,.direction =  "down") %>% print(n=35)

Upvotes: 3

Views: 218

Answers (3)

Ian Campbell
Ian Campbell

Reputation: 24878

Here's an approach using tidyr::complete:

library(tidyverse)
mytib %>%
  complete(month = 1:12, year, shop_id) %>%
  group_by(shop_id) %>%
  fill(inventory, .direction = "down") %>%
  mutate(inventory = replace_na(inventory, 0))
## A tibble: 60 x 4
## Groups:   shop_id [5]
#   month  year shop_id inventory
#   <dbl> <dbl> <chr>       <dbl>
# 1     1  2019 A             200
# 2     1  2019 B             300
# 3     1  2019 C             240
# 4     1  2019 D               0
# 5     1  2019 E               0
# 6     2  2019 A             200
# 7     2  2019 B             300
# 8     2  2019 C             240
# 9     2  2019 D               0
#10     2  2019 E               0
## … with 50 more rows

Upvotes: 1

Vinson Ciawandy
Vinson Ciawandy

Reputation: 1166

library(tidyverse)
mytib <- tribble(
  ~year,  ~month, ~shop_id, ~inventory,
  2019, 01, "A", 200,
  2019, 01, "B", 300,
  2019, 01, "C", 240,
  2019, 05, "A", 250,
  2019, 05, "B", 400,
  2019, 05, "D", 400,
  2019, 08, "A", 300,
  2019, 08, "B", 250,
  2019, 08, "C", 200,
  2019, 11, "A", 300,
  2019, 11, "E", 250)

grid <- expand.grid(year = 2019, 
                    month = 1:12, 
                    shop_id = unique(mytib$shop_id))
grid %>% 
  left_join(mytib)%>% # Since the columns are the same, left_join will automatically match the columns
  group_by(shop_id) %>% # Apply following function on each shop_id
  fill(inventory,.direction = "down") %>% #Since the data is ordered descending in time, direction="down" will use the past value to substitute the future value
  ungroup() %>% # Remove the grouped condition
  filter(!is.na(inventory)) #If the store just appear in later time, the earlier time is null, we want to remove this

# A tibble: 46 x 4
    year month shop_id inventory
   <dbl> <dbl> <chr>       <dbl>
 1  2019     1 A             200
 2  2019     2 A             200
 3  2019     3 A             200
 4  2019     4 A             200
 5  2019     5 A             250
 6  2019     6 A             250
 7  2019     7 A             250
 8  2019     8 A             300
 9  2019     9 A             300
10  2019    10 A             300
# … with 36 more rows

Upvotes: 1

Joel Kandiah
Joel Kandiah

Reputation: 1525

Your code is good it may be worth rewriting now you know what each function does as the essential functions I have used are contained within your code . To add the results to with missing values in all data we can use the left_join or right_join functions which are both able to match up certain values and maintain either all entries in the first or second arguments.

After this we group by the shop_id as we want to fill in the values separately for each shop. Then we fill in the values using tidyr::fill() specifying that we want to fill in the down direction (i.e ascending dates). Finally we use filter to remove the NA results.

library(tidyverse)


mytib <- tribble(
  ~year,  ~month, ~shop_id, ~inventory,
  2019, 01, "A", 200,
  2019, 01, "B", 300,
  2019, 01, "C", 240,
  2019, 05, "A", 250,
  2019, 05, "B", 400,
  2019, 05, "D", 400,
  2019, 08, "A", 300,
  2019, 08, "B", 250,
  2019, 08, "C", 200,
  2019, 11, "A", 300,
  2019, 11, "E", 250,)


grid <- expand.grid(year = 2019, 
                    month = 1:12, 
                    shop_id = unique(mytib$shop_id))

left_join(grid, mytib, by = c("year" = "year", "month" = "month", "shop_id" = "shop_id")) %>%
  group_by(shop_id) %>% 
  fill(inventory, .direction = "down") %>% 
  filter(!is.na(inventory))
#> # A tibble: 46 x 4
#> # Groups:   shop_id [5]
#>     year month shop_id inventory
#>    <dbl> <dbl> <chr>       <dbl>
#>  1  2019     1 A             200
#>  2  2019     2 A             200
#>  3  2019     3 A             200
#>  4  2019     4 A             200
#>  5  2019     5 A             250
#>  6  2019     6 A             250
#>  7  2019     7 A             250
#>  8  2019     8 A             300
#>  9  2019     9 A             300
#> 10  2019    10 A             300
#> # ... with 36 more rows

Created on 2021-04-07 by the reprex package (v2.0.0)

Upvotes: 1

Related Questions