Reputation: 89
I'm using dplyr to mutate columns in my dataframe. It consists on creating a ratio of the current row value to the max value so far (basically a lag and cummax combination). It works great. Except when there's a NA value, because all the following calculations become NA.
I tried placing na.omit() here and there but while it might work, the function fails because na.omit() messes with the length of the vectors and it crashes.
Here is my reproducible code:
v1<-c(NA,100,80,40,NA,30,100,40,20,10,NA,NA,1,NA)
v2<-c(100,100,90,50,NA,-40,NA,-10,NA,NA,NA,1,NA,NA)
group<-c(1,1,1,1,1,1,2,2,2,2,2,3,3,4)
x1<-as.data.frame(cbind(v1,v2,group))
library(dplyr)
for ( i in c("v1","v2")){
x1<-x1 %>%
group_by(group) %>%
mutate( !!sym(paste( i,"_max_lag_ratio", sep="")) := get(i)/ lag( as.vector(cummax( get(i))) , default=first(get(i))))
}
If I add na.omit() as follows:
mutate( !!sym(paste( i,"_max_lag_ratio", sep="")) := get(i)/ lag( cummax( na.omit(get(i))) , default=first( get(i) )))
I get the following error:
Error: Column `column_max_lag_ratio` must be length 1 (the group size), not 0
Most likely because of one single group (group 4) having only NAs. How can I make this failsafe? My real dataset features "imperfect" data. Help is greatly appreciated since I'm really stucked.
Upvotes: 0
Views: 549
Reputation: 89
Made this workaround and did the trick.
v1<-c(NA,100,80,40,NA,30,100,40,20,10,NA,NA,1,NA)
v2<-c(100,100,90,50,NA,-40,NA,-10,NA,NA,NA,1,NA,NA)
group<-c(1,1,1,1,1,1,2,2,2,2,2,3,3,4)
x1<-as.data.frame(cbind(v1,v2,group))
library(dplyr)
for ( i in c("v1","v2")){
x1<-x1 %>%
group_by(group) %>%
mutate( !!sym(paste( i,"_max_lag_ratio", sep="")) := get(i)/(lag( cummax( ifelse(is.na(get(i)), na.omit(get(i) ) ,get(i))) , default=first(get(i))))
)
}
Upvotes: 0
Reputation: 1124
A working solution based on this answer Need to get R cummax but dealing properly with NAs could be:
df %>%
replace_na(list(v1=-Inf, v2=-Inf)) %>%
group_by(group) %>%
mutate(max_v1 = cummax(v1),
max_v2 = cummax(v2)
) %>%
group_by(group) %>%
mutate(v1_max_lag_ratio = v1/lag(max_v1)) %>%
mutate(v2_max_lag_ratio = v2/lag(max_v2))
Upvotes: 0