Reputation: 770
This feels like it should be an easy question, but I have looked here and other places and can't find a simple answer.
I have survey responses and I need to create a 1/0 dummy for source of the response. I am trying to create a simple flag variable by looking through all data in the comment field, and if the substring matches, flag it 1.
Data EG
ID comment
1 rubber chickens
2 180107 RG - email taken from 2017 graduate survey
I need R to look through the comment field, and anytime it sees the phrase 'graduate survey' to code my grad_svy field as 1, otherwise 0.
When I write
data$grad_svy <- ifelse((substr(data$comment,34,49) == "graduate survey"),1,0)
It'll run, but it doesn't mark anything as a 1, when in fact there are hundreds of places it should be marking a 1. I know the two letter phrase begins at 34, and ends at 49, for every instance in the field. I am not sure what I'm not doing, the FAQ for ifelse and substring have been pretty unhelpful.
Upvotes: 1
Views: 2854
Reputation: 1311
You may want to use grepl
and data.table
for things like this. For example:
library(data.table)
setDT(data)
data[, grad_svy := as.numeric(grepl("graduate survey", comment))]
Upvotes: 1
Reputation: 5689
You can try this, which uses only base R:
data$grad_svy <- as.numeric(grepl("graduate survey", data$comment))
grepl
will return a logical vector if the pattern "graduate survey"
is found in data$comment
. Then using as.numeric
will convert that logical vector into numbers for you: 1 = TRUE
, 0 = FALSE
Upvotes: 2