Maria Provelegiou
Maria Provelegiou

Reputation: 81

Find the difference of a column ignoring NAs. Find difference only of numbers

What if i have a dataset like this and want to find the diff of each column which have NAs keeping the NAs and finding the difference between the existing numbers.

x   y
NA  8
NA  21
5   NA
7   NA
NA  NA
0   NA
NA  30
13  8
13  NA
NA  3

The desired output would be

x   y
NA  NA
NA  13
NA  NA
2   NA
NA  NA
-7  NA
NA  9
13  -22
0  NA
NA  -5

Upvotes: 0

Views: 326

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388817

You can write a function to calculate the difference between consecutive non-NA values and place them appropriately.

calculate_diff <- function(x) {
  x[which(!is.na(x))] <- c(NA, diff(na.omit(x)))
  x
}

This function can be applied to all the columns in base R using lapply -

df[] <- lapply(df, calculate_diff)
df

#    x   y
#1  NA  NA
#2  NA  13
#3  NA  NA
#4   2  NA
#5  NA  NA
#6  -7  NA
#7  NA   9
#8  13 -22
#9   0  NA
#10 NA  -5

Or in dplyr with across -

library(dplyr)
df <- df %>% mutate(across(.fns = calculate_diff))

Upvotes: 1

koolmees
koolmees

Reputation: 2783

Using data.table and zoo:

df <- data.table(df)

df[, x := ifelse(!is.na(x), c(NA, diff(na.locf0(x))), NA)]
df[, y := ifelse(!is.na(y), c(NA, diff(na.locf0(y))), NA)]

Edit: elegantly in a single line:

df <- data.table(df)

df[, c("x", "y") := lapply(.SD, function(x) ifelse(!is.na(x), c(NA, diff(na.locf0(x))), NA))]

Upvotes: 1

DataM
DataM

Reputation: 351

Maybe not the best way but I think that could do the trick for you :

my_df <- data.frame(x = c(NA, NA, 5, 7, NA, 0, NA, 13, 13, NA),
                    y = c(8, 21, NA, NA, NA, NA, 30, 8, NA, 3),
                    stringsAsFactors = FALSE)

my_var <- which(!is.na(my_df$y))
if (length(my_var)) {
  my_a <- my_df[my_var, "y"]
  my_a <- my_a[-length(my_var)]
  my_b <- my_df[my_var, "y"]
  my_b <- my_b[-1]
  my_c <- my_b - my_a
  my_var <- my_var[-1]
  my_df[my_var, "Diff"] <- my_c
}

Upvotes: 1

Related Questions