Reputation: 45
I would like to know (for each row) which columns any NA in my data falls under. The goal is to create a new column/variable that lists the names of the columns the data shows NA for, for that particular row, preferably using dplyr.
Using this mock data,
data = tibble(var_1 = c(NA, 4, 5, 6, 7), var_2 = c(4, 5, 6, 7, 8), var_3 = c(NA, NA, NA, 3, 5))
I'd like to create the missing_col column:
var_1 var_2 var_3 missing_col
1 NA 4 NA "var_1", "var_3"
2 4 5 NA "var_3"
3 5 6 NA "var_3"
4 6 7 3 NA
5 7 8 5 NA
My approach thus far has been to use the rowwise() function in conjunction with mutate and a nested select_if() and a function. However, none of the functions that I have tried so far have allowed me to only consider each row individually (as opposed to the entire column). Below I have included the general structure of my approach.
data %>%
rowwise() %>%
mutate(missing_col = select_if(function(x) ... )) %>%
names()
Any guidance toward the appropriate function would be appreciated.
Upvotes: 2
Views: 156
Reputation: 2521
This isn't a tidy way, but an apply seems to be simple enough:
data = tibble(var_1 = c(NA, 4, 5, 6, 7), var_2 = c(4, 5, 6, 7, 8), var_3 = c(NA, NA, NA, 3, 5))
data$missing = apply(data,1,function(x) names(x)[is.na(x)])
which outputs:
data
# A tibble: 5 x 4
var_1 var_2 var_3 missing
<dbl> <dbl> <dbl> <list>
1 NA 4.00 NA <chr [2]>
2 4.00 5.00 NA <chr [1]>
3 5.00 6.00 NA <chr [1]>
4 6.00 7.00 3.00 <chr [0]>
5 7.00 8.00 5.00 <chr [0]>
So each element of the missing column is a list containing the character vector
data$missing[[1]]
[1] "var_1" "var_3"
data$missing[[1]][1]
[1] "var_1"
Upvotes: 0
Reputation: 592
A possible solution is to use apply
directly in mutate instead of rowwise
. Perhaps an equivalent approach is possible with rowwise
, but my experience with that function is limited. The second mutate
is only necessary if you need NA
for lines without NA
.
data %>%
mutate(missing_col = apply(., 1, function(x) names(.)[is.na(x)] %>% paste(collapse = ", "))) %>%
mutate(missing_col = if_else(missing_col == "", NA_character_, missing_col))
Upvotes: 0
Reputation: 61154
> data %>%
+ mutate(missing_col = apply(., 1, function(x) which(is.na(x))) %>%
+ map_chr(., function(x) if_else(length(x)==0,
+ "NA",
+ paste(names(x), collapse=", "))))
# A tibble: 5 x 4
var_1 var_2 var_3 missing_col
<dbl> <dbl> <dbl> <chr>
1 NA 4 NA var_1, var_3
2 4 5 NA var_3
3 5 6 NA var_3
4 6 7 3 NA
5 7 8 5 NA
Upvotes: 1
Reputation: 796
You can try this:
#Unlist the results from apply
missing_col=unlist(x=apply(X=data, MARGIN=1, FUN=function(x){
#Get the names of the rows which have NA
NamesNA=names(which(is.na(x)))
#If there's no NA then replace the 'character(0)' for NA.
if(length(NamesNA)!=0){
#Concatenate names if there are more than one
paste0(NamesNA, collapse=",")
}else{
#Replace 'character(0)'
NA
}
})
)
#Add column with desired output
data$missing_col=missing_col
This give the desired output:
# A tibble: 5 x 4
var_1 var_2 var_3 missing_col
<dbl> <dbl> <dbl> <chr>
1 NA 4 NA var_1,var_3
2 4 5 NA var_3
3 5 6 NA var_3
4 6 7 3 NA
5 7 8 5 NA
Upvotes: 1