Reputation: 171
I do have a problem, that I can't seem to solve efficiently.
Say the result of my loop is the following (which is made up).
library(tidyverse)
mytib <- tribble(
~year, ~month, ~shop_id, ~inventory,
2019, 01, "A", 200,
2019, 01, "B", 300,
2019, 01, "C", 240,
2019, 05, "A", 250,
2019, 05, "B", 400,
2019, 05, "D", 400,
2019, 08, "A", 300,
2019, 08, "B", 250,
2019, 08, "C", 200,
2019, 11, "A", 300,
2019, 11, "E", 250,)
That is, I only get results for January, May, August and November (because data is only available for these dates). However, I would like to carry values forward for February, March, June(those would get the January values. June and July will get the values from May and so on. I can't use "fill", since R doesn't "know" that there is February, March...etc (and I cannot "mutate" these months into "mytib".)
I came up with a solution which unfortunately is highly inefficient and prone to errors. The solution (in words) is the following: Create a grid with months from 1:12, and all unique shop_ids (see code below), then create separate vectors with the shops I observe in January, May, August and November.
Then group_split the grid (into a list) by month and year. Then reduce all unique shopids to those I observe. Join them alltogher (using dplyr::join_all), then left-join the original tibble to the reduced grid, and finally carry forward (.direction="down") observed values, so that I end up with a tibbe "result", which is what I want.
Although I reach my desired goal, I wonder whether there is a solution that is much more efficient and less prone to errors than my bumbling approach. Any help or hint is highly appreciated.
Ps. Please don't be too harsh to me, as I'm still pretty new to R.
Here is the complete code:
mytib <- tribble(
~year, ~month, ~shop_id, ~inventory,
2019, 01, "A", 200,
2019, 01, "B", 300,
2019, 01, "C", 240,
2019, 05, "A", 250,
2019, 05, "B", 400,
2019, 05, "D", 400,
2019, 08, "A", 300,
2019, 08, "B", 250,
2019, 08, "C", 200,
2019, 11, "A", 300,
2019, 11, "E", 250,)
grid <- expand.grid(year = 2019,
month = 1:12,
shop_id = unique(mytib$shop_id))
grid
jan2019 <- mytib %>% filter(year == 2019 & month ==01)
jan2019 <- jan2019$shop_id
may2019 <- mytib %>% filter(year == 2019 & month == 05)
may2019 <- may2019$shop_id
aug2019 <- mytib %>% filter(year == 2019 & month == 08)
aug2019 <-aug2019$shop_id
nov2019 <- mytib %>% filter(year == 2019 & month == 11)
nov2019 <- nov2019$shop_id
my_list <- grid %>% group_by(year, month) %>% group_split()
my_list
my_list[[1]] <- my_list[[1]][my_list[[1]]$shop_id %in% jan2019,] ; my_list[[1]]
my_list[[2]] <- my_list[[2]][my_list[[2]]$shop_id %in% jan2019,] ; my_list[[2]]
my_list[[3]] <- my_list[[3]][my_list[[3]]$shop_id %in% jan2019,] ; my_list[[3]]
my_list[[4]] <- my_list[[4]][my_list[[4]]$shop_id %in% jan2019,] ; my_list[[4]]
my_list[[5]] <- my_list[[5]][my_list[[5]]$shop_id %in% may2019,] ; my_list[[5]]
my_list[[6]] <- my_list[[6]][my_list[[6]]$shop_id %in% may2019,] ; my_list[[6]]
my_list[[7]] <- my_list[[7]][my_list[[7]]$shop_id %in% may2019,] ; my_list[[7]]
my_list[[8]] <- my_list[[8]][my_list[[8]]$shop_id %in% aug2019,] ; my_list[[8]]
my_list[[9]] <- my_list[[9]][my_list[[9]]$shop_id %in% aug2019,] ; my_list[[9]]
my_list[[10]]<- my_list[[10]][my_list[[10]]$shop_id %in% aug2019,];my_list[[10]]
my_list[[11]]<- my_list[[11]][my_list[[11]]$shop_id %in% nov2019,];my_list[[11]]
my_list[[12]]<- my_list[[12]][my_list[[12]]$shop_id %in% nov2019,];my_list[[12]]
result <- plyr::join_all(my_list, type="full")
result
result <- left_join(result, mytib, by=c("year", "month", "shop_id"))
result %>% group_by(shop_id) %>% fill(inventory,.direction = "down") %>% print(n=35)
Upvotes: 3
Views: 218
Reputation: 24878
Here's an approach using tidyr::complete
:
library(tidyverse)
mytib %>%
complete(month = 1:12, year, shop_id) %>%
group_by(shop_id) %>%
fill(inventory, .direction = "down") %>%
mutate(inventory = replace_na(inventory, 0))
## A tibble: 60 x 4
## Groups: shop_id [5]
# month year shop_id inventory
# <dbl> <dbl> <chr> <dbl>
# 1 1 2019 A 200
# 2 1 2019 B 300
# 3 1 2019 C 240
# 4 1 2019 D 0
# 5 1 2019 E 0
# 6 2 2019 A 200
# 7 2 2019 B 300
# 8 2 2019 C 240
# 9 2 2019 D 0
#10 2 2019 E 0
## … with 50 more rows
Upvotes: 1
Reputation: 1166
library(tidyverse)
mytib <- tribble(
~year, ~month, ~shop_id, ~inventory,
2019, 01, "A", 200,
2019, 01, "B", 300,
2019, 01, "C", 240,
2019, 05, "A", 250,
2019, 05, "B", 400,
2019, 05, "D", 400,
2019, 08, "A", 300,
2019, 08, "B", 250,
2019, 08, "C", 200,
2019, 11, "A", 300,
2019, 11, "E", 250)
grid <- expand.grid(year = 2019,
month = 1:12,
shop_id = unique(mytib$shop_id))
grid %>%
left_join(mytib)%>% # Since the columns are the same, left_join will automatically match the columns
group_by(shop_id) %>% # Apply following function on each shop_id
fill(inventory,.direction = "down") %>% #Since the data is ordered descending in time, direction="down" will use the past value to substitute the future value
ungroup() %>% # Remove the grouped condition
filter(!is.na(inventory)) #If the store just appear in later time, the earlier time is null, we want to remove this
# A tibble: 46 x 4
year month shop_id inventory
<dbl> <dbl> <chr> <dbl>
1 2019 1 A 200
2 2019 2 A 200
3 2019 3 A 200
4 2019 4 A 200
5 2019 5 A 250
6 2019 6 A 250
7 2019 7 A 250
8 2019 8 A 300
9 2019 9 A 300
10 2019 10 A 300
# … with 36 more rows
Upvotes: 1
Reputation: 1525
Your code is good it may be worth rewriting now you know what each function does as the essential functions I have used are contained within your code . To add the results to with missing values in all data we can use the left_join
or right_join
functions which are both able to match up certain values and maintain either all entries in the first or second arguments.
After this we group by the shop_id
as we want to fill in the values separately for each shop. Then we fill in the values using tidyr::fill()
specifying that we want to fill in the down direction (i.e ascending dates). Finally we use filter to remove the NA
results.
library(tidyverse)
mytib <- tribble(
~year, ~month, ~shop_id, ~inventory,
2019, 01, "A", 200,
2019, 01, "B", 300,
2019, 01, "C", 240,
2019, 05, "A", 250,
2019, 05, "B", 400,
2019, 05, "D", 400,
2019, 08, "A", 300,
2019, 08, "B", 250,
2019, 08, "C", 200,
2019, 11, "A", 300,
2019, 11, "E", 250,)
grid <- expand.grid(year = 2019,
month = 1:12,
shop_id = unique(mytib$shop_id))
left_join(grid, mytib, by = c("year" = "year", "month" = "month", "shop_id" = "shop_id")) %>%
group_by(shop_id) %>%
fill(inventory, .direction = "down") %>%
filter(!is.na(inventory))
#> # A tibble: 46 x 4
#> # Groups: shop_id [5]
#> year month shop_id inventory
#> <dbl> <dbl> <chr> <dbl>
#> 1 2019 1 A 200
#> 2 2019 2 A 200
#> 3 2019 3 A 200
#> 4 2019 4 A 200
#> 5 2019 5 A 250
#> 6 2019 6 A 250
#> 7 2019 7 A 250
#> 8 2019 8 A 300
#> 9 2019 9 A 300
#> 10 2019 10 A 300
#> # ... with 36 more rows
Created on 2021-04-07 by the reprex package (v2.0.0)
Upvotes: 1