Checking for NAs by Row, then Aggregating their Column Names

Question

I currently am struggling to get information row by row on which variables turned out well/which are NA. Below, I show an example/dput output of a section of my data.

head(big_test)
# A tibble: 3 x 19
     id ctr_n   ctr    yr    mn  nvvi ENP_nat ENP_avg ENP_wght inflation1 inflation2 inflation3 inflation4 PSNS  PSNS_s PSNS_w
                                              
1  1854 Isra~   376  2019     4     1    3.50    3.50     3.50 NA         NA         NA         NA         NA     0.962 NA    
2  1855 Isra~   376  2019     9     1    2.51    2.51     2.51 NA         NA         NA         NA         NA     0.992 NA    
3  1856 Isra~   376  2020     3     1    3.78    3.78     3.78 NA         NA         NA         NA         NA     0.999 NA    
# ... with 3 more variables: PSNS_sw , local_E , cst_tot 

dput(big_test)
structure(list(id = c(1854, 1855, 1856), ctr_n = c("Israel", 
"Israel", "Israel"), ctr = c(376, 376, 376), yr = c(2019, 2019, 
2020), mn = c(4, 9, 3), nvvi = c(1, 1, 1), ENP_nat = c(3.50348063163162, 
2.51319610127466, 3.78468892335972), ENP_avg = c(3.50348063163162, 
2.51319610127466, 3.78468892335972), ENP_wght = c(3.50348063163162, 
2.51319610127466, 3.78468892335972), inflation1 = c("NA", "NA", 
"NA"), inflation2 = c("NA", "NA", "NA"), inflation3 = c("NA", 
"NA", "NA"), inflation4 = c("NA", "NA", "NA"), PSNS = c("NA", 
"NA", "NA"), PSNS_s = c(0.961748183147869, 0.992275075925835, 
0.998547438416594), PSNS_w = c("NA", "NA", "NA"), PSNS_sw = c("NA", 
"NA", "NA"), local_E = c(1, 1, 1), cst_tot = c(1, 1, 1)), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame"))

EDIT: The NAs here are in quotes, which is inappropriate. I think the issue came from writing to .xlsx; the correct version is listed below in a dput output without the quotes on the NAs.

As you can see, the data is separated at the national level by election, where each row here should be unique (i.e., Israel, 2019, month 4). I am looking to create a character column which lists the variables missing in this output. Here is an example of the column desired:

desired_output <- tibble(missing_vars=paste("inflation1","inflation2","inflation3","inflation4","etc",sep=";"))

head(desired_output)
# A tibble: 1 x 1
  missing_vars                                   
                                            
1 inflation1;inflation2;inflation3;inflation4;etc

Thus, I am wondering if there is some sort of loop that can slice a unique election, look at missing columns, then vector those that are missing? This is crucial to automate because some variables can be missing where others are present for the same country/year. I have tried to count them, but I cannot figure out how to list those column names as a character column.

Any help is appreciate. Thank you!

Vincent · Accepted Answer

As noted by @akrun, you have "NA" strings instead of NA. Once that is fixed, you can define a missing function and apply it to each row to create a new variable:

big_test <- structure(list(id = c(1854, 1855, 1856), ctr_n = c("Israel", 
"Israel", "Israel"), ctr = c(376, 376, 376), yr = c(2019, 2019, 
2020), mn = c(4, 9, 3), nvvi = c(1, 1, 1), ENP_nat = c(3.50348063163162, 
2.51319610127466, 3.78468892335972), ENP_avg = c(3.50348063163162, 
2.51319610127466, 3.78468892335972), ENP_wght = c(3.50348063163162, 
2.51319610127466, 3.78468892335972), inflation1 = c(NA, NA, 
NA), inflation2 = c(NA, NA, NA), inflation3 = c(NA, 
NA, NA), inflation4 = c(NA, NA, NA), PSNS = c(NA, 
NA, NA), PSNS_s = c(0.961748183147869, 0.992275075925835, 
0.998547438416594), PSNS_w = c(NA, NA, NA), PSNS_sw = c(NA, 
NA, NA), local_E = c(1, 1, 1), cst_tot = c(1, 1, 1)), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame"))

missing <- function(x) {
  idx <- is.na(unlist(x))
  paste(colnames(big_test)[idx], collapse=", ")
}

big_test$missing <- apply(big_test, 1, missing)

big_test$missing
#> [1] "inflation1, inflation2, inflation3, inflation4, PSNS, PSNS_w, PSNS_sw"
#> [2] "inflation1, inflation2, inflation3, inflation4, PSNS, PSNS_w, PSNS_sw"
#> [3] "inflation1, inflation2, inflation3, inflation4, PSNS, PSNS_w, PSNS_sw"

Checking for NAs by Row, then Aggregating their Column Names

Answers (2)

Related Questions