Reputation: 10483
I have a data frame like this:
dput(df1)
structure(list(x = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), y = c(16449L, NA, NA,
16449L, 16450L, 16451L, NA, NA, 16455L, 16456L, NA, NA, 16756L,
NA, 16460L, 16464L, 16469L, NA, NA, 16469L)), .Names = c("x",
"y"), row.names = c(NA, -20L), class = "data.frame")
I need to mutate the y
column as follows (using dplyr
):
df1 <- mutate(df1, y = ifelse(is.na(y), cummax(y), y))
However, cummax
does not handle NAs right for my case. How can I get the same effect with some alternative method?
Resulting output should have the NA rows of y
filled with the last previous non-NA value of y
. They are arranged in order.
Alternatively, I tried something like this, and it does not work:
mutate(df1, y = ifelse(is.na(y), max(y[1:row_number()], na.rm = TRUE), y)
Because row_number() itself is a vector of 1 through current row, it creates error.
EDIT: Desired output is as follows:
structure(list(x = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), y = c(16449, 16449,
16449, 16449, 16450, 16451, 16451, 16451, 16455, 16456, 16456,
16456, 16756, 16756, 16460, 16464, 16469, 16756, 16756, 16469
)), class = "data.frame", .Names = c("x", "y"), row.names = c(NA,
-20L))
Upvotes: 1
Views: 1425
Reputation: 145775
If the input starts with NA
, then the cummax
should be NA
until the first non-NA value. Here's a solution that takes that into account:
cummax_na = function(x, na.rm = TRUE) {
if(!na.rm) return(cummax(x))
if(all(is.na(x))) return(x)
# check for leading NAs to keep
first_non_na = match(TRUE, !is.na(x))
x = dplyr::coalesce(x, -Inf)
result = cummax(x)
if(first_non_na > 1) result[1:(first_non_na - 1)] = NA
result
}
cummax_na(c(1, 4, 3, 2, 1))
# [1] 1 4 4 4 4
cummax_na(c(1, 4, 3, NA, 1, 5))
# [1] 1 4 4 4 4 5
cummax_na(c(NA, 1, 4, 3, NA, 1, 5))
# [1] NA 1 4 4 4 4 5
cummax_na(c(NA, NA, 1, 4, 3, NA, 1, 5))
# [1] NA NA 1 4 4 4 4 5
Upvotes: 3
Reputation: 2939
Reduce()
would be another option:
df1 %>% mutate( y.cummax = ifelse( is.na(y) , Reduce( function(x1, x2) { max(x1,x2 ,na.rm =TRUE) } , y , accumulate = TRUE ) , y ) )
x y
1 1 16449
2 2 16449
3 3 16449
4 4 16449
5 5 16450
6 6 16451
7 7 16451
8 8 16451
9 9 16455
10 10 16456
11 1 16456
12 2 16456
13 3 16756
14 4 16756
15 5 16460
16 6 16464
17 7 16469
18 8 16756
19 9 16756
20 10 16469
Upvotes: 2
Reputation: 31171
You can do:
library(dplyr)
v = cummax(ifelse(is.na(df1$y), -Inf, df1$y)) #A. Webb suggested -Inf instead of 0, great!
mutate(df1, y=ifelse(is.na(y), v, y))
# x y
#1 1 16449
#2 2 16449
#3 3 16449
#4 4 16449
#5 5 16450
#6 6 16451
#7 7 16451
#8 8 16451
#9 9 16455
#10 10 16456
#11 1 16456
#12 2 16456
#13 3 16756
#14 4 16756
#15 5 16460
#16 6 16464
#17 7 16469
#18 8 16756
#19 9 16756
#20 10 16469
Or you can use data.table
:
setDT(transform(df1,ix=1:nrow(df1)))[,max(df1$y[1:ix],na.rm=T),by=.(ix)]
Upvotes: 6