Reputation: 25
I just have a column "methods_discussed" in CSV (link is https://github.com/pandas-dev/pandas/files/3496001/multiple_responses.zip)
multi<- read.csv("multiple_responses.csv", header = T)
This file having values name of family planning methods in the column name like:
methods_discussed
emergency female_sterilization male_sterilization iud NaN injectables male_condoms -77 male_condoms female_sterilization male_sterilization injectables iud male_condoms
I have created a vector of all but not -77 and NAN of 8 family planning methods as:
method_names = c('female_condoms', 'emergency', 'male_condoms', 'pill', 'injectables', 'iud', 'male_sterilization', 'female_sterilization')
I want to create new indicator variable based on the names of vector (method_names) in the existing data frame multi2, for this I used (I)
for (abc in method_names) {
multi2[abc]<- as.integer(str_detect(multi2$methods_discussed, fixed(abc)))
}
(II)
for (abc in method_names) {
multi2[abc]<- as.integer(str_contains(abc,multi2$methods_discussed))
}
(III) I also tried
for (abc in method_names) {
multi2[abc]<- as.integer(stri_detect_fixed(multi2$methods_discussed, abc))
}
but the output is not matching as expected. Probably male_sterilization is a substring of female_sterilization and it shows 1(TRUE) for male_sterilization for female_sterlization also. It is shown below in the Actual output at row 2. It must show 0 (FALSE) as female_sterilization is in the method_discussed column at row 2. I also don't want to generate any thing like 0/1 (False/True) (should be blank) corresponding to -77 and blank in method_discussed (All are highlighted in Expected output.
Expected Output
No error in code but only in the output.
Upvotes: 0
Views: 215
Reputation: 389315
You can add word boundaries to fix that issue.
multi<- read.csv("multiple_responses.csv", header = T)
method_names = c('female_condoms', 'emergency', 'male_condoms', 'pill', 'injectables', 'iud', 'male_sterilization', 'female_sterilization')
for (abc in method_names) {
multi[abc]<- as.integer(grepl(paste0('\\b', abc, '\\b'), multi$methods_discussed))
}
multi[multi$methods_discussed %in% c('', -77), method_names] <- ''
Upvotes: 1