Reputation: 375
I created a simplified version of my data since my actual data is too large to be include here.
structure(list(Name = c("A", "A", "A", "A", "B", "B", "B", "B",
"C", "C", "C", "C"), Category = c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), Year = c(2018L, 2019L, 2020L, 2021L, 2018L,
2019L, 2020L, 2021L, 2018L, 2019L, 2020L, 2021L), Cost = c(NA,
100L, NA, NA, 99L, 45L, 34L, NA, 64L, 34L, NA, 34L)), class = "data.frame", row.names = c(NA,
-12L))
The focus here is on the column "Cost" . If there is NA, in a first step, I use the most recent year with available data in the respective category.
Data %>%
group_by(`Category`, Year) %>%
dplyr:: mutate(Cost = na.locf(Cost))
The problem now is that for "Name" A, there is no value in the "Year" 2018. That's why I get an error. How do I need to adjust, the code, that then in a second step, if there is no data available from prior year for this respective "Category", I use the data from the next available year. In this specific case it would be "Year" 2019 and the value 100.
Thank you for your help in advance.
Upvotes: 0
Views: 94
Reputation: 52219
Use tidyr::fill
with .direction = "downup"
:
library(tidyr)
library(dplyr)
Data %>%
group_by(Category) %>%
fill(Cost, .direction = "downup")
# A tibble: 12 × 4
# Groups: Category [2]
Name Category Year Cost
<chr> <int> <int> <int>
1 A 1 2018 100
2 A 1 2019 100
3 A 1 2020 100
4 A 1 2021 100
5 B 2 2018 99
6 B 2 2019 45
7 B 2 2020 34
8 B 2 2021 34
9 C 2 2018 64
10 C 2 2019 34
11 C 2 2020 34
12 C 2 2021 34
Upvotes: 1