Reputation: 982
How may I obtain the same functionality as grepl
within a vectorial conditional statement?
I wish to transform raw data of geographical divisions (mixed with other categories) by prepending the city name where it is missing from district names:
#Build index dataframe
(index <- data.frame(div_raw=c("Brussels", "Paris", "Paris I", "II", "total"),
city=c("Brussels", "Paris", "Paris", "Paris", NA)))
# div_raw city
#1 Brussels Brussels
#2 Paris Paris
#3 Paris I Paris
#4 II Paris
#5 total <NA>
#Prepend city name to district names, where available
index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city), city, ""), div_raw))
index
# div_raw city div
#1 Brussels Brussels Brussels
#2 Paris Paris Paris
#3 Paris I Paris Paris Paris I
#4 II Paris Paris II
#5 total <NA> total
As can be seen, we should also test for whether the city is already included in the district name, but grepl
applies the entire pattern vector rather than just the matching pattern value:
index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city) & !grepl(city, div_raw), city, ""), div_raw))
#Warning message:
#In grepl(city, div_raw) :
# argument 'pattern' has length > 1 and only the first element will be used
index
# div_raw city div
#1 Brussels Brussels Brussels
#2 Paris Paris Paris
#3 Paris I Paris Paris Paris I
#4 II Paris Paris II
#5 total <NA> total
The expected result:
index
# div_raw city div
#1 Brussels Brussels Brussels
#2 Paris Paris Paris
#3 Paris I Paris Paris I
#4 II Paris Paris II
#5 total <NA> total
Upvotes: 1
Views: 63
Reputation: 11128
Change your code like this using Vectorize
, It should work, instead of grepl use vgrepl
like below. Vectorize function vectorizes on the parameters, although you can choose what parameters to vectorize upon using vectorize.args
, since by default grepl is not vectorized on patterns with input you are getting this error:
vgrepl <- Vectorize(grepl)
# you can write this also: vgrepl <- Vectorize(grepl, vectorize.args = c('x', 'pattern'))
index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city) & !vgrepl(city, div_raw), city, ""), div_raw))
Output:
> index
div_raw city div
1 Brussels Brussels Brussels
2 Paris Paris Paris
3 Paris I Paris Paris I
4 II Paris Paris II
5 total <NA> total
Upvotes: 2
Reputation: 388982
You may use vectorized grepl
i.e stringr::str_detect
:
index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city) &
!stringr::str_detect(div_raw, city), city, ""), div_raw))
index
# div_raw city div
#1 Brussels Brussels Brussels
#2 Paris Paris Paris
#3 Paris I Paris Paris I
#4 II Paris Paris II
#5 total <NA> total
Upvotes: 2