Reputation: 593
Let's say I have this data:
name <- c("Name1","Name2","Name3","Name4",NA)
state <- c("State1","State2","State3","State4","State5")
id <- c("id1",NA,NA,"id4","id5")
size <- c(NA,"size2",NA,"size4",NA)
and then i create this df
df <- data.frame(name,state,id,size)
> df
name state id size
1 Name1 State1 id1 NA
2 Name2 State2 NA size2
3 Name3 State3 NA NA
4 Name4 State4 id4 size4
5 NA State5 id5 NA
And the class
of the columns is defined in a vector like this:
vars <- c("name","state","id","size")
type <- c("A","A","B","C")
class <- data.frame(vars,type)
> class
vars type
1 name A
2 state A
3 id B
4 size C
What i want to do is create another column, named with the type
, so i can get and logical output value: if at least one of the same type
is not NA, it should return true inside them, just like this:
name state id size A B C
1 Name1 State1 id1 NA TRUE TRUE FALSE
2 Name2 State2 NA size2 TRUE FALSE TRUE
3 Name3 State3 NA NA TRUE FALSE FALSE
4 Name4 State4 id4 size4 TRUE TRUE TRUE
5 NA State5 id5 NA TRUE TRUE FALSE
How could i work on it to get something like the desired output?
Upvotes: 1
Views: 48
Reputation: 886948
We can split
the 'vars' column by the 'type' in the 'class' dataset (class
is a function name), loop through the list
, subset the 'df' columns from the 'vars', convert it to a logical matrix
by checking it is not equal to a blank, get the rowSums
and create a logical vector
by comparing it with the number of columns of the dataset i.e. we are checking the number of TRUE values are equal to the number of columns
cbind(df, sapply(split(as.character(class$vars), class$type),
function(x) rowSums(df[x] != "") == ncol(df[x])))
# name state id size A B C
#1 Name1 State1 id1 TRUE TRUE FALSE
#2 Name2 State2 size2 TRUE FALSE TRUE
#3 Name3 State3 TRUE FALSE FALSE
#4 Name4 State4 id4 size4 TRUE TRUE TRUE
#5 State5 id5 FALSE TRUE FALSE
Another option without using the split
would be loop through the 'unique` elements of the 'type' column in 'class' and then do the subsetting
library(tidyverse)
class %>%
pull(type) %>%
unique %>%
map(~ class %>%
filter(type == .x) %>%
pull(vars) %>%
as.character %>%
select(df, .) %>%
`!=`("") %>%
as_tibble %>%
reduce(`&`)) %>%
bind_cols(df, .)
Based on the updated dataset in the OP's post with NA
elements, we replace the df[x] != ""
to !is.na(df[x])
cbind(df, sapply(split(as.character(class$vars), class$type),
function(x) rowSums(!is.na(df[x])) >0))
# name state id size A B C
#1 Name1 State1 id1 <NA> TRUE TRUE FALSE
#2 Name2 State2 <NA> size2 TRUE FALSE TRUE
#3 Name3 State3 <NA> <NA> TRUE FALSE FALSE
#4 Name4 State4 id4 size4 TRUE TRUE TRUE
#5 <NA> State5 id5 <NA> TRUE TRUE FALSE
Upvotes: 1