Omry Atia
Omry Atia

Reputation: 2443

Progression of non-missing values that have missing values in-between

To continue on a previous topic: Finding non-missing values between missing values

I would like to also find whether the value before the missing value is smaller, equal to or larger than the one after the missing.

To use the same example from before:

df = structure(list(FirstYStage = c(NA, 3.2, 3.1, NA, NA, 2, 1, 3.2, 
3.1, 1, 2, 5, 2, NA, NA, NA, NA, 2, 3.1, 1), SecondYStage = c(NA, 
3.1, 3.1, NA, NA, 2, 1, 4, 3.1, 1, NA, 5, 3.1, 3.2, 2, 3.1, NA, 
2, 3.1, 1), ThirdYStage = c(NA, NA, 3.1, NA, NA, 3.2, 1, 4, NA, 
1, NA, NA, 3.2, NA, 2, 3.2, NA, NA, 2, 1), FourthYStage = c(NA, 
NA, 3.1, NA, NA, NA, 1, 4, NA, 1, NA, NA, NA, 4, 2, NA, NA, NA, 
2, 1), FifthYStage = c(NA, NA, 2, NA, NA, NA, 1, 5, NA, NA, NA, 
NA, 3.2, NA, 2, 3.2, NA, NA, 2, 1)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -20L))

rows 13, 14 and 16 having non-missing in between missing values. The output this time should be: "same", "larger" and "same" for rows 13, 14, and 16, and say "N/A" for the other rows.

Upvotes: 1

Views: 66

Answers (1)

Sotos
Sotos

Reputation: 51592

A straight forward approach would be to split, convert to numeric, take the last 2 values and compare with an ifelse statement, i.e.

sapply(strsplit(do.call(paste, df)[c(13, 14, 16)], 'NA| '), function(i){
                                  v1 <- as.numeric(tail(i[i != ''], 2)); 
                                  ifelse(v1[1] > v1[2], 'greater', 
                                           ifelse(v1[1] == v1[2], 'same', 'smaller'))
                                   })

#[1] "same"    "smaller" "same"

NOTE

I took previous answer as a given (do.call(paste, df)[c(13, 14, 16)])

A more generic approach (as noted by Ronak, last 2 digits will fail in some cases) would be,

sapply(strsplit(gsub("([[:digit:]])+\\s+[NA]+\\s+([[:digit:]])", '\\1_\\2', 
                   do.call(paste, df)[c(13, 14, 16)]), ' '), function(i) { 
                                             v1 <- i[grepl('_', i)]; 
                                             v2 <- strsplit(v1, '_')[[1]]; 
                                            ifelse(v2[1] > v2[2], 'greater', 
                                               ifelse(v2[1] == v2[2], 'same', 'smaller')) })

#[1] "same"    "smaller" "same" 

Upvotes: 2

Related Questions