Reputation: 81
What if i have a dataset like this and want to find the diff of each column which have NAs keeping the NAs and finding the difference between the existing numbers.
x y
NA 8
NA 21
5 NA
7 NA
NA NA
0 NA
NA 30
13 8
13 NA
NA 3
The desired output would be
x y
NA NA
NA 13
NA NA
2 NA
NA NA
-7 NA
NA 9
13 -22
0 NA
NA -5
Upvotes: 0
Views: 326
Reputation: 388817
You can write a function to calculate the difference between consecutive non-NA values and place them appropriately.
calculate_diff <- function(x) {
x[which(!is.na(x))] <- c(NA, diff(na.omit(x)))
x
}
This function can be applied to all the columns in base R using lapply
-
df[] <- lapply(df, calculate_diff)
df
# x y
#1 NA NA
#2 NA 13
#3 NA NA
#4 2 NA
#5 NA NA
#6 -7 NA
#7 NA 9
#8 13 -22
#9 0 NA
#10 NA -5
Or in dplyr
with across
-
library(dplyr)
df <- df %>% mutate(across(.fns = calculate_diff))
Upvotes: 1
Reputation: 2783
Using data.table
and zoo
:
df <- data.table(df)
df[, x := ifelse(!is.na(x), c(NA, diff(na.locf0(x))), NA)]
df[, y := ifelse(!is.na(y), c(NA, diff(na.locf0(y))), NA)]
Edit: elegantly in a single line:
df <- data.table(df)
df[, c("x", "y") := lapply(.SD, function(x) ifelse(!is.na(x), c(NA, diff(na.locf0(x))), NA))]
Upvotes: 1
Reputation: 351
Maybe not the best way but I think that could do the trick for you :
my_df <- data.frame(x = c(NA, NA, 5, 7, NA, 0, NA, 13, 13, NA),
y = c(8, 21, NA, NA, NA, NA, 30, 8, NA, 3),
stringsAsFactors = FALSE)
my_var <- which(!is.na(my_df$y))
if (length(my_var)) {
my_a <- my_df[my_var, "y"]
my_a <- my_a[-length(my_var)]
my_b <- my_df[my_var, "y"]
my_b <- my_b[-1]
my_c <- my_b - my_a
my_var <- my_var[-1]
my_df[my_var, "Diff"] <- my_c
}
Upvotes: 1