syre
syre

Reputation: 982

grepl functionality in vectorial conditional statement

How may I obtain the same functionality as grepl within a vectorial conditional statement?

I wish to transform raw data of geographical divisions (mixed with other categories) by prepending the city name where it is missing from district names:

#Build index dataframe
(index <- data.frame(div_raw=c("Brussels", "Paris", "Paris I", "II", "total"), 
                     city=c("Brussels", "Paris", "Paris", "Paris", NA)))
#   div_raw     city
#1 Brussels Brussels
#2    Paris    Paris
#3  Paris I    Paris
#4       II    Paris
#5    total     <NA>

#Prepend city name to district names, where available
index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city), city, ""), div_raw))
index
#   div_raw     city           div
#1 Brussels Brussels      Brussels
#2    Paris    Paris         Paris
#3  Paris I    Paris Paris Paris I
#4       II    Paris      Paris II
#5    total     <NA>         total

As can be seen, we should also test for whether the city is already included in the district name, but grepl applies the entire pattern vector rather than just the matching pattern value:

index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city) & !grepl(city, div_raw), city, ""), div_raw))
#Warning message:
#In grepl(city, div_raw) :
#  argument 'pattern' has length > 1 and only the first element will be used

index
#   div_raw     city           div
#1 Brussels Brussels      Brussels
#2    Paris    Paris         Paris
#3  Paris I    Paris Paris Paris I
#4       II    Paris      Paris II
#5    total     <NA>         total

The expected result:

index
#   div_raw     city           div
#1 Brussels Brussels      Brussels
#2    Paris    Paris         Paris
#3  Paris I    Paris       Paris I
#4       II    Paris      Paris II
#5    total     <NA>         total

Upvotes: 1

Views: 63

Answers (2)

PKumar
PKumar

Reputation: 11128

Change your code like this using Vectorize, It should work, instead of grepl use vgrepl like below. Vectorize function vectorizes on the parameters, although you can choose what parameters to vectorize upon using vectorize.args, since by default grepl is not vectorized on patterns with input you are getting this error:

vgrepl <- Vectorize(grepl)
# you can write this also: vgrepl <- Vectorize(grepl, vectorize.args = c('x', 'pattern'))

index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city) & !vgrepl(city, div_raw), city, ""), div_raw))

Output:

> index
   div_raw     city       div
1 Brussels Brussels  Brussels
2    Paris    Paris     Paris
3  Paris I    Paris   Paris I
4       II    Paris  Paris II
5    total     <NA>     total

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

You may use vectorized grepl i.e stringr::str_detect :

index$div <- with(index, paste(ifelse(div_raw != city & !is.na(city) & 
                 !stringr::str_detect(div_raw, city), city, ""), div_raw))
index

#   div_raw     city       div
#1 Brussels Brussels  Brussels
#2    Paris    Paris     Paris
#3  Paris I    Paris   Paris I
#4       II    Paris  Paris II
#5    total     <NA>     total

Upvotes: 2

Related Questions