Reputation: 5681
I have this data frame.
library(dplyr)
df <- tibble(grp = c(1, 1, 1, 1, 1, 1, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7),
count = c(NA, NA, NA, NA, NA, NA, NA, 6, 6, 6, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3),
mdo = c(1500, 1500, 1500, 1500,
1500, 1500, NA, 0,
0, 0, 1100, 1100,
1100, 200, 200, 200,
1100, 1100, 1100, 0)
)
I want to do this computation.
df <- df %>%
mutate(result = mdo/count)
the result:
grp count mdo result
<dbl> <dbl> <dbl> <dbl>
1 1 NA 1500 NA
2 1 NA 1500 NA
3 1 NA 1500 NA
4 1 NA 1500 NA
5 1 NA 1500 NA
6 1 NA 1500 NA
7 2 NA NA NA
8 3 6 0 0
9 3 6 0 0
10 3 6 0 0
11 4 3 1100 367.
12 4 3 1100 367.
13 4 3 1100 367.
14 5 3 200 66.7
15 5 3 200 66.7
16 5 3 200 66.7
17 6 3 1100 367.
18 6 3 1100 367.
19 6 3 1100 367.
20 7 3 0 0
Now, I want to do the above computation but when the previous mdo value (per group , grp) is zero, leave it as zero. So, I want the result to be:
NA
NA
NA
NA
NA
NA
NA
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
66.66667
66.66667
66.66667
366.66667
366.66667
366.66667
0.00000
EDIT ---
Using this data
df <- tibble(grp = c(1, 1, 1, 1, 1, 1, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8),
count = c(NA, NA, NA, NA, NA, NA, NA, 6, 6, 6, NA, NA, NA, NA, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3),
mdo = c(1500, 1500, 1500, 1500, 1500, 1500,
NA,0, 0, 0, NA, NA, NA, NA,
1100, 1100, 1100,
200, 200,200,
1100, 1100, 1100, 0)
)
gives:
grp count mdo prev_mdo result
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NA 1500 NA NA
2 1 NA 1500 NA NA
3 1 NA 1500 NA NA
4 1 NA 1500 NA NA
5 1 NA 1500 NA NA
6 1 NA 1500 NA NA
7 2 NA NA 1500 NA
8 3 6 0 NA 0
9 3 6 0 NA 0
10 3 6 0 NA 0
11 4 NA NA 0 0
12 4 NA NA 0 0
13 4 NA NA 0 0
14 4 NA NA 0 0
15 5 3 1100 NA 367.
16 5 3 1100 NA 367.
17 5 3 1100 NA 367.
18 6 3 200 1100 66.7
19 6 3 200 1100 66.7
20 6 3 200 1100 66.7
21 7 3 1100 200 367.
22 7 3 1100 200 367.
23 7 3 1100 200 367.
24 8 3 0 1100 0
but I would expect the first 367. values to be zero. Because before 1100 we have NA (which we must omit) and before these NA we have zero. So, result should be zero there. Instead, the code right now skips the NA, goes to previous 3 zeros (above NA) and divides 1110 with them.
Upvotes: 0
Views: 552
Reputation: 817
in assumption that you need the mdo value of the previous group and that - in case of NAs - you want to keep the original result, the following should work
df %>%
dplyr::left_join(df %>%
dplyr::distinct(grp,mdo) %>%
dplyr::mutate(prev_mdo=dplyr::lag(mdo,1)) %>%
dplyr::select(-mdo),
by="grp") %>%
dplyr::mutate(result=mdo/count,
result2=dplyr::if_else(!is.na(prev_mdo) & prev_mdo==0,
0,
result))
# A tibble: 20 x 6
grp count mdo prev_mdo result result2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NA 1500 NA NA NA
2 1 NA 1500 NA NA NA
3 1 NA 1500 NA NA NA
4 1 NA 1500 NA NA NA
5 1 NA 1500 NA NA NA
6 1 NA 1500 NA NA NA
7 2 NA NA 1500 NA NA
8 3 6 0 NA 0 0
9 3 6 0 NA 0 0
10 3 6 0 NA 0 0
11 4 3 1100 0 367. 0
12 4 3 1100 0 367. 0
13 4 3 1100 0 367. 0
14 5 3 200 1100 66.7 66.7
15 5 3 200 1100 66.7 66.7
16 5 3 200 1100 66.7 66.7
17 6 3 1100 200 367. 367.
18 6 3 1100 200 367. 367.
19 6 3 1100 200 367. 367.
20 7 3 0 1100 0 0
Edit: now that I have read in more detail what you want to do, it's clear to me, why my first solution felt somehow wrong. It felt wrong, because it is wrong :D
Here is a solution that should fit your problem. To work, won't have to construct weird if-else-conditions that try to mimic the output. You just have to prepare the source of the condition in the right way.
Long answer short: you have to use kind of a nested lag ...
df %>%
dplyr::left_join(df %>%
dplyr::distinct(grp,mdo) %>%
# ignore groups with mdo=NA, instead take the values of the last non-NA-group
dplyr::mutate(mdo2 = dplyr::if_else(is.na(mdo),dplyr::lag(mdo,1),mdo),
prev_mdo=dplyr::lag(mdo2,1)) %>%
dplyr::select(-mdo),
by="grp") %>%
dplyr::mutate(result=mdo/count,
result2=dplyr::if_else(prev_mdo==0,
0,
result))
Upvotes: 1
Reputation: 5232
group_mdo <- df %>%
select(grp, mdo) %>%
unique() %>%
mutate(prev_mdo = lag(mdo)) %>%
select(-mdo)
df %>%
left_join(group_mdo, by = "grp") %>%
mutate(result = ifelse(prev_mdo != 0 | is.na(prev_mdo), mdo / count, 0))
gives:
grp count mdo prev_mdo result
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NA 1500 NA NA
2 1 NA 1500 NA NA
3 1 NA 1500 NA NA
4 1 NA 1500 NA NA
5 1 NA 1500 NA NA
6 1 NA 1500 NA NA
7 2 NA NA 1500 NA
8 3 6 0 NA 0
9 3 6 0 NA 0
10 3 6 0 NA 0
11 4 3 1100 0 0
12 4 3 1100 0 0
13 4 3 1100 0 0
14 5 3 200 1100 66.7
15 5 3 200 1100 66.7
16 5 3 200 1100 66.7
17 6 3 1100 200 367.
18 6 3 1100 200 367.
19 6 3 1100 200 367.
20 7 3 0 1100 0
EDIT
This should work for both cases now.
group_mdo <- df %>%
select(grp, mdo) %>%
unique() %>%
mutate(prev_mdo = lag(mdo)) %>%
select(-mdo) %>%
tidyr::fill(prev_mdo, .direction = "down")
df %>%
left_join(group_mdo, by = "grp") %>%
mutate(result = ifelse(prev_mdo != 0, mdo / count, 0))
Upvotes: 2