Lucca Ramalho
Lucca Ramalho

Reputation: 593

Add logical value inside new column according to 'NA' values

Let's say I have this data:

name <- c("Name1","Name2","Name3","Name4",NA)
state <- c("State1","State2","State3","State4","State5")
id <- c("id1",NA,NA,"id4","id5")
size <- c(NA,"size2",NA,"size4",NA)

and then i create this df

df <- data.frame(name,state,id,size)

> df
   name  state  id  size
1 Name1 State1 id1  NA    
2 Name2 State2 NA   size2
3 Name3 State3 NA   NA    
4 Name4 State4 id4  size4
5 NA    State5 id5  NA    

And the class of the columns is defined in a vector like this:

vars <- c("name","state","id","size")
type <- c("A","A","B","C")

class <- data.frame(vars,type)

> class
   vars type
1  name    A
2 state    A
3    id    B
4  size    C

What i want to do is create another column, named with the type, so i can get and logical output value: if at least one of the same type is not NA, it should return true inside them, just like this:

   name  state  id  size  A     B      C
 1 Name1 State1 id1 NA    TRUE  TRUE   FALSE
 2 Name2 State2 NA  size2 TRUE  FALSE  TRUE
 3 Name3 State3 NA  NA    TRUE  FALSE  FALSE
 4 Name4 State4 id4 size4 TRUE  TRUE   TRUE
 5 NA    State5 id5 NA    TRUE  TRUE   FALSE

How could i work on it to get something like the desired output?

Upvotes: 1

Views: 48

Answers (1)

akrun
akrun

Reputation: 886948

We can split the 'vars' column by the 'type' in the 'class' dataset (classis a function name), loop through the list, subset the 'df' columns from the 'vars', convert it to a logical matrix by checking it is not equal to a blank, get the rowSums and create a logical vector by comparing it with the number of columns of the dataset i.e. we are checking the number of TRUE values are equal to the number of columns

cbind(df, sapply(split(as.character(class$vars), class$type),
             function(x) rowSums(df[x] != "") == ncol(df[x])))
#   name  state  id  size     A     B     C
#1 Name1 State1 id1        TRUE  TRUE FALSE
#2 Name2 State2     size2  TRUE FALSE  TRUE
#3 Name3 State3            TRUE FALSE FALSE
#4 Name4 State4 id4 size4  TRUE  TRUE  TRUE
#5       State5 id5       FALSE  TRUE FALSE

Another option without using the split would be loop through the 'unique` elements of the 'type' column in 'class' and then do the subsetting

library(tidyverse)
class %>%
    pull(type) %>%
    unique %>% 
    map(~ class %>%
              filter(type == .x) %>% 
              pull(vars) %>% 
              as.character %>% 
              select(df, .) %>%
               `!=`("") %>%
              as_tibble %>%
              reduce(`&`)) %>%
    bind_cols(df, .) 

Update

Based on the updated dataset in the OP's post with NA elements, we replace the df[x] != "" to !is.na(df[x])

cbind(df, sapply(split(as.character(class$vars), class$type),
         function(x) rowSums(!is.na(df[x])) >0))
#   name  state   id  size    A     B     C
#1 Name1 State1  id1  <NA> TRUE  TRUE FALSE
#2 Name2 State2 <NA> size2 TRUE FALSE  TRUE
#3 Name3 State3 <NA>  <NA> TRUE FALSE FALSE
#4 Name4 State4  id4 size4 TRUE  TRUE  TRUE
#5  <NA> State5  id5  <NA> TRUE  TRUE FALSE

Upvotes: 1

Related Questions