Reputation: 79
I need to calculate differences between non-NA values in a row. For example, if there are values only at points a, c, and e, and values in b and d are NA, I need to calculate the difference between c and a, e and c, and leave the difference between b and a and d and c blank. d1 is the difference between non-NA value in b and nearest non-NA value to the left (which has to be non-NA value in a). d2 is the difference between non-NA value in c and nearest non-NA value to the left. d3 is the difference between non-NA value in d and nearest non-NA value to the left. d4 is the difference between non-NA value in e and the nearest non-NA value to the left.
I think I'm missing some R functions that are available to use in this situation. I tried writing a number of ifelse conditions that account for the preceding data point being NA, and it turns out to be a very long ifelse statement.
df$d1<-ifelse(!is.na(df$a and !is.na(df$b), df$b-df$a)
But the farther I get from a, the more complex the ifelse statements get.
I also tried writing
df$d1<-(!is.na(df$b))-(!is.na(df$a))
And the result is not difference, but whether the first data point is NA or not (I get 0, 1, -1 in the d1 column).
This is how my original database is structured:
```a<-c(10, 20, NA, 40, 50, 60)
b<-c(5, NA, 6, 7, NA, 8)
c<-c(NA, 4, 5, NA, 7, 8)
d<-c(NA, 9, 8, 7, 6, 5)
e<-c(3, 4, NA, 5, 6, 7)
df<-data.frame(a, b, c, d, e)```
This is how I need the result to look:
```d1<-c('-5','' ,'' , '-33','', '-52')
d2<-c('', '-16', '-1', '', '-43', '0')
d3<-c('', '5', '3', '0', '-1', '-3')
d4<-c('-2', '-5', '', '-2', '0', '2')
df1<-data.frame(d1, d2, d3, d4)```
Upvotes: 1
Views: 583
Reputation: 887118
Here is an option. We loop through the rows with pmap
(or using apply
from base R
with MARGIN = 1
), get the diff
ernce of adjacent non-NA elements ('i1'), bind the rows (pmap_dfr
), select
the column names in the correct order and rename
the columns
library(dplyr)
library(stringr)
library(purrr)
pmap_dfr(df, ~ {
x <- c(...)
i1 <- !is.na(x)
diff(x[i1]) %>%
as.list}) %>%
select(sort(names(.))) %>%
rename_all(~ str_c('d', seq_along(.)))
# A tibble: 6 x 4
# d1 d2 d3 d4
# <dbl> <dbl> <dbl> <dbl>
#1 -5 NA NA -2
#2 NA -16 5 -5
#3 NA -1 3 NA
#4 -33 NA 0 -2
#5 NA -43 -1 0
#6 -52 0 -3 2
NOTE: Here, by default, the missing elements will be filled with NA
. It is better not to use blank strings (""
) as it changes the column type from numeric
to character
If we have NA
only rows, find the
pmap_dfr(df, ~ {
x <- c(...)
i1 <- !is.na(x)
if(any(i1)) {
diff(x[i1]) %>%
as.list
} else set_names(rep(list(NA_real_), length(x)-1), names(x)[-1])}) %>%
select(sort(names(.))) %>%
rename_all(~ str_c('d', seq_along(.)))
Upvotes: 1