vp_050
vp_050

Reputation: 508

Apply strsplit and ifelse condition accross multiple columns in R

I have a dataframe as shown below and the output from this code is shown below.

Question: Since in the original dataframe, there can be multiple columns of value format type as in col1 and col2, how the code below can be modified to get the desired output?

#STEP1

df <- data.frame(
  col1 = c("abc_1_102", "abc_1_103", "xyz_1_104"),
  col2 = c("107", "108", "106")
)

#STEP2

split_text <- strsplit(df$col1, "_")

third_elements <- sapply(split_text, function(x) if(length(x) >= 3) x[3] else NA)

#STEP3

df$col3<-third_elements

#STEP4

selection<-c(107,102,108)

df$col4<-ifelse(df$col2 %in% selection,"SELECT","NOTSELECT")

df$col5<-ifelse(df$col3 %in% selection,"SELECT","NOTSELECT")

#STEP5

df$col6<-paste(df$col4,df$col5,sep = ",")

Output from above code:

       col1 col2 col3      col4      col5                col6
1 abc_1_102  107  102    SELECT    SELECT       SELECT,SELECT
2 abc_1_103  108  103    SELECT NOTSELECT    SELECT,NOTSELECT
3 xyz_1_104  106  104 NOTSELECT NOTSELECT NOTSELECT,NOTSELECT

Desired output

       col1 col2                col6
1 abc_1_102  107       SELECT,SELECT
2 abc_1_103  108    NOTSELECT,SELECT
3 xyz_1_104  106 NOTSELECT,NOTSELECT

Upvotes: 2

Views: 65

Answers (2)

jpsmith
jpsmith

Reputation: 17145

You can do this all in one go with by pasting two ifelse statements together. The ifelse for col2 is straightforward. The ifelse for col3 uses grepl to search for any of the numbers in select by creating a search string using paste(..., collapse = "|") (pasting the "or" operator between them). The outer paste(..., sep = ",") puts it all together:

df$col6 <- paste(ifelse(df$col2 %in% selection, "SELECT", "NOTSELECT"),
                 ifelse(grepl(paste(selection, collapse = "|"), df$col1), "SELECT", "NOTSELECT"),
                 sep = ",")

Or to more safely add a word boundary to the second ifelse (thanks to @r2evans!)

df$col6 <- paste(ifelse(df$col2 %in% selection, "SELECT", "NOTSELECT"),
                 ifelse(grepl(
                   paste0("(^|_)(", paste(selection, collapse = "|"), ")(_|$)\\b"), 
                   df$col1),
                   "SELECT", "NOTSELECT"),
                 sep = ",")

Both give the same output in this example:

       col1 col2                col6
1 abc_1_102  107       SELECT,SELECT
2 abc_1_103  108    SELECT,NOTSELECT
3 xyz_1_104  106 NOTSELECT,NOTSELECT

Upvotes: 3

r2evans
r2evans

Reputation: 160387

df$col6 <- paste(
  ifelse(df$col2 %in% selection, "SELECT", "NOTSELECT"),
  strsplit(df$col1, "_") |>
    sapply(`%in%`, selection) |>
    colSums() |>
    ifelse("SELECT", "NOTSELECT"),
  sep = ",")
#        col1 col2                col6
# 1 abc_1_102  107       SELECT,SELECT
# 2 abc_1_103  108    SELECT,NOTSELECT
# 3 xyz_1_104  106 NOTSELECT,NOTSELECT

Upvotes: 2

Related Questions