mayre
mayre

Reputation: 3

How to find a word in a colum

I have code to get Y or N in a new column, if ‘Keywords’ contains ‘Complete’ (if possible also variation of the word such as complete, Complet) then Y, else N. My problem is that, when the word in on higher case or separated by "," "/" or "-" the code do not work. Can you help ?

id_base_full$Completion.Flag <- 0

x <- nrow(id_base_full)

for(i in 1:x){

  if (grepl("Complete",id_base_full$HI.Keywords[i])){
    id_base_full$Completion.Flag[i] <- "Y"
  }else if (grepl("complete",id_base_full$HI.Keywords[i])){
    id_base_full$Completion.Flag[i] <- "Y"
  }else if (grepl("Complet" ,id_base_full$HI.Keywords[i])){
    id_base_full$Completion.Flag[i] <- "Y"
  }else{ 
    id_base_full$Completion.Flag[i] <- "N" 
  }
  next [i]
} 

Upvotes: 0

Views: 40

Answers (1)

Tensibai
Tensibai

Reputation: 15784

Something like this should achieve what you want:

id_base_full$Completion.Flag <- "N"
id_base_full$Completion.Flag[grepl("complete?", ignore.case=TRUE, id_base_full$HI.Keywords)] <- "Y"

The idea is to create the column with "N" everywhere and then for the rows where the word complet (with an optional e at end) is found set the value to "Y".

In regex the ? means 0 or 1 occurrence of the preceding character (e here), grepl will return a logical vector of TRUE/FALSE allowing to select the proper rows.

To be more straightforward than Y/N, I'd keep the Boolean values in the resulting dataset with:

id_base_full$Completion.Flag <- grepl("complete?", ignore.case=TRUE, id_base_full$HI.Keywords)

Upvotes: 1

Related Questions