tadat
tadat

Reputation: 79

r difference between non-NA column values

I need to calculate differences between non-NA values in a row. For example, if there are values only at points a, c, and e, and values in b and d are NA, I need to calculate the difference between c and a, e and c, and leave the difference between b and a and d and c blank. d1 is the difference between non-NA value in b and nearest non-NA value to the left (which has to be non-NA value in a). d2 is the difference between non-NA value in c and nearest non-NA value to the left. d3 is the difference between non-NA value in d and nearest non-NA value to the left. d4 is the difference between non-NA value in e and the nearest non-NA value to the left.

I think I'm missing some R functions that are available to use in this situation. I tried writing a number of ifelse conditions that account for the preceding data point being NA, and it turns out to be a very long ifelse statement. df$d1<-ifelse(!is.na(df$a and !is.na(df$b), df$b-df$a) But the farther I get from a, the more complex the ifelse statements get. I also tried writing df$d1<-(!is.na(df$b))-(!is.na(df$a)) And the result is not difference, but whether the first data point is NA or not (I get 0, 1, -1 in the d1 column).

This is how my original database is structured:

```a<-c(10, 20, NA, 40, 50, 60)
b<-c(5, NA, 6, 7, NA, 8)
c<-c(NA, 4, 5, NA, 7, 8)
d<-c(NA, 9, 8, 7, 6, 5)
e<-c(3, 4, NA, 5, 6, 7)
df<-data.frame(a, b, c, d, e)```

This is how I need the result to look:

```d1<-c('-5','' ,'' , '-33','', '-52')
d2<-c('', '-16', '-1', '', '-43', '0')
d3<-c('', '5', '3', '0', '-1', '-3')
d4<-c('-2', '-5', '', '-2', '0', '2')
df1<-data.frame(d1, d2, d3, d4)```

Upvotes: 1

Views: 583

Answers (1)

akrun
akrun

Reputation: 887118

Here is an option. We loop through the rows with pmap (or using apply from base R with MARGIN = 1), get the differnce of adjacent non-NA elements ('i1'), bind the rows (pmap_dfr), select the column names in the correct order and rename the columns

library(dplyr)
library(stringr)
library(purrr)
pmap_dfr(df,  ~ {
       x <- c(...)
      i1 <- !is.na(x)
       diff(x[i1]) %>% 
    as.list}) %>%       
  select(sort(names(.))) %>%
  rename_all(~ str_c('d', seq_along(.)))
# A tibble: 6 x 4
#     d1    d2    d3    d4
#  <dbl> <dbl> <dbl> <dbl>
#1    -5    NA    NA    -2
#2    NA   -16     5    -5
#3    NA    -1     3    NA
#4   -33    NA     0    -2
#5    NA   -43    -1     0
#6   -52     0    -3     2

NOTE: Here, by default, the missing elements will be filled with NA. It is better not to use blank strings ("") as it changes the column type from numeric to character


If we have NA only rows, find the

pmap_dfr(df,  ~ {
     x <- c(...)
    i1 <- !is.na(x)
    if(any(i1)) {
     diff(x[i1]) %>% 
     as.list
    } else set_names(rep(list(NA_real_), length(x)-1), names(x)[-1])}) %>%       
  select(sort(names(.))) %>%
  rename_all(~ str_c('d', seq_along(.)))

Upvotes: 1

Related Questions