Reputation: 3
I have code to get Y or N in a new column, if ‘Keywords’ contains ‘Complete’ (if possible also variation of the word such as complete, Complet) then Y, else N. My problem is that, when the word in on higher case or separated by "," "/" or "-" the code do not work. Can you help ?
id_base_full$Completion.Flag <- 0
x <- nrow(id_base_full)
for(i in 1:x){
if (grepl("Complete",id_base_full$HI.Keywords[i])){
id_base_full$Completion.Flag[i] <- "Y"
}else if (grepl("complete",id_base_full$HI.Keywords[i])){
id_base_full$Completion.Flag[i] <- "Y"
}else if (grepl("Complet" ,id_base_full$HI.Keywords[i])){
id_base_full$Completion.Flag[i] <- "Y"
}else{
id_base_full$Completion.Flag[i] <- "N"
}
next [i]
}
Upvotes: 0
Views: 40
Reputation: 15784
Something like this should achieve what you want:
id_base_full$Completion.Flag <- "N"
id_base_full$Completion.Flag[grepl("complete?", ignore.case=TRUE, id_base_full$HI.Keywords)] <- "Y"
The idea is to create the column with "N" everywhere and then for the rows where the word complet (with an optional e at end) is found set the value to "Y".
In regex the ?
means 0 or 1 occurrence of the preceding character (e here), grepl will return a logical vector of TRUE/FALSE allowing to select the proper rows.
To be more straightforward than Y/N, I'd keep the Boolean values in the resulting dataset with:
id_base_full$Completion.Flag <- grepl("complete?", ignore.case=TRUE, id_base_full$HI.Keywords)
Upvotes: 1