McMahok
McMahok

Reputation: 360

Fill a new column based on whether two other cols are filled or empty

Here are some example data to illustrate my problem

dput(example)
structure(list(ID1 = c("", "abj6498", "", "", "", "", "", "", 
"avl5803", "avl3030", "vz30775", "", "", "s883612", "avl7884", 
"", "", "avl7905", "", "", "", "", "ty68281", "", "", "z101246", 
"", "", "", "", "afh2697", "", "", "", "", "avl3037", "", "", 
"", "tw77193", "", "", "tw77053", "avl6702", "", "abj6256", "", 
"", "", ""), ID2 = c("", "and7457", "", "", "", "", "", "", "", 
"j215736", "vz33290", "", "", "afh4845", "", "", "", "j215809", 
"", "", "", "", "ty69088", "", "", "j215813", "", "", "avl6192", 
"", "j215807", "", "avl5585", "", "", "avl5441", "", "", "", 
"ty70093", "", "", "tr43355", "avl3013", "", "afh3143", "", "", 
"", "")), row.names = c(NA, -50L), class = "data.frame")

I have the dataframe example which has two cols, ID1 and ID2 all I need is a new column present based on whether or not either or both of ID1 and ID2 contain values

e.g. If ID1 is full but ID2 is not then the new column present should have the value 1 and vice versa, if both contain a value then present would contain 2

I have tried various methods using tidyverse::mutate and if statements but can't quite get the logic right.

Upvotes: 1

Views: 951

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389245

We can use rowSums to count number of values which are not empty in a row. This will return 0 for those values that are both empty, you can turn them to NA if needed.

example$present <- rowSums(example != '')
head(example, 10)

#       ID1     ID2 present
#1                        0
#2  abj6498 and7457       2
#3                        0
#4                        0
#5                        0
#6                        0
#7                        0
#8                        0
#9  avl5803               1
#10 avl3030 j215736       2

Upvotes: 2

TarJae
TarJae

Reputation: 79224

We could replace empty with NA and check with an case_when statement:

library(dplyr)
df %>%
  mutate(across(everything(), ~ifelse(.=="", NA, as.character(.)))) %>% 
  mutate(present = case_when(!is.na(ID1) & !is.na(ID2) ~ 2,
                             !is.na(ID1) | !is.na(ID2) ~ 1)) 

output:

   ID1     ID2     present
   <chr>   <chr>     <dbl>
 1 NA      NA           NA
 2 abj6498 and7457       2
 3 NA      NA           NA
 4 NA      NA           NA
 5 NA      NA           NA
 6 NA      NA           NA
 7 NA      NA           NA
 8 NA      NA           NA
 9 avl5803 NA            1
10 avl3030 j215736       2
# … with 40 more rows

Upvotes: 3

Related Questions