difference between first non-NA and last non-NA in each row

Question

I have a data frame with up to 5 measurements (x) and their corresponding time:

df = structure(list(x1 = c(92.9595722286402, 54.2085219673818, 
46.3227062573019, 
NA, 65.1501442134141, 49.736451235317), time1 = c(43.2715277777778, 
336.625, 483.975694444444, NA, 988.10625, 510.072916666667), 
x2 = c(82.8368681534474, 53.7981639701784, 12.9993531230419, 
NA, 64.5678816290574, 55.331442940348), time2 = c(47.8166666666667, 
732, 506.747222222222, NA, 1455.25486111111, 958.976388888889
), x3 = c(83.5433119686794, 65.723072881366, 19.0147593408309, 
NA, 65.1989838202356, 36.7000828457705), time3 = c(86.5888888888889, 
1069.02083333333, 510.275, NA, 1644.21527777778, 1154.95694444444
), x4 = c(NA, 66.008102917677, 40.6243513885846, NA, 62.1694420909955, 
29.0078249523063), time4 = c(NA, 1379.22986111111, 520.726388888889, 
NA, 2057.20833333333, 1179.86805555556), x5 = c(NA, 61.0047472617535, 
45.324715258421, NA, 59.862110645527, 45.883161439362), time5 = c(NA, 
1825.33055555556, 523.163888888889, NA, 3352.26944444444, 
1364.99513888889)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))

"NA" means that the person (row) didn't have a measurement.

I would like to calculate the difference between the last existing measurement and the first one.

So for the first one it would be x3 minus x1 (6.4), for the second it would be -6.8 and so on.

I tried something like this, which didnt work:

df$diff = apply(df %>% select(., contains("x")), 1, function(x) head(x, 
na.rm = T) - tail(x, na.rm=T))

Any suggestions? Also, is apply/rowwise the most efficient way, or is there a vectorized function to do that?

Ronak Shah · Accepted Answer

A vectorized way would be using max.col where we get "first" and "last" non-NA value using ties.method parameter

#Get column number of first and last col
first_col <- max.col(!is.na(df[x_cols]), ties.method = "first")
last_col <- max.col(!is.na(df[x_cols]), ties.method = "last")

#subset the dataframe to include only `"x"` cols
new_df <- as.data.frame(df[grep("^x", names(df))])

#Subtract last non-NA value with the first one
df$new_calc <- new_df[cbind(1:nrow(df), last_col)] - 
               new_df[cbind(1:nrow(df), first_col)]

Using apply you could do

x_cols <- grep("^x", names(df))

df$new_calc <- apply(df[x_cols], 1, function(x) {
    new_x <- x[!is.na(x)]
    if (length(new_x) > 0)
      new_x[length(new_x)] - new_x[1L]
    else NA
})

difference between first non-NA and last non-NA in each row

Answers (2)

Related Questions