Reputation: 13475
In R, I would like to use a grep
or 'grepl' or 'gsub' command to find all elements in the vector of strings, that have either an A road or M road or a B road name in it..
Please see example below
tmp <- c('Little Street','A323', 'Essex Road (A43)', 'M43','Orange street','M4','B2045','New Street')
And I would like a function to return...
c('Minor Road','A Road', 'A Road', 'M Road', 'Minor Road', 'M Road','B Road','Minor Road')
My first thought was to use something like
grepl('[0-9]',tmp)
but this can't distinguish between the A road, B road and M road....
As always any help would be greatly be appreciated...
Upvotes: 1
Views: 167
Reputation: 269764
This can be done in a single strapply statement which returns letter followed by " Road"
for each input component having a letter followed by a number. For any non-matched components use "Minor Road"
:
library(gsubfn)
strapply(tmp, "(\\D)\\d", ~ paste(x, "Road"), empty = "Minor Road", simplify = TRUE)
giving:
[1] "Minor Road" "A Road" "A Road" "M Road" "Minor Road"
[6] "M Road" "B Road" "Minor Road"
Update: Simplified answer down to one statement.
Upvotes: 1
Reputation: 4767
Using rex may make this type of task a little simpler.
tmp <- c('Little Street','A323', 'Essex Road (A43)', 'M43','Orange street','M4','B2045','New Street')
library(rex)
classify_road <- function(x) {
res <- re_matches(x,
rex(
capture(name = "type",
upper
),
digit
)
)
res$type[ is.na(res$type) ] <- "Minor"
paste(res$type, "Road")
}
classify_road(tmp)
#>[1] "Minor Road" "A Road" "A Road" "M Road" "Minor Road"
#>[6] "M Road" "B Road" "Minor Road"
Upvotes: 0
Reputation: 70732
You could break it down into steps using grepl
and sub
...
> tmp[!grepl('[ABM]\\d', tmp)] <- 'Minor Road'
> sub('.*([ABM])\\d.*', '\\1 Road', tmp)
# [1] "Minor Road" "A Road" "A Road" "M Road" "Minor Road"
# [6] "M Road" "B Road" "Minor Road"
Upvotes: 3
Reputation: 206308
How about this
tmp <- c('Little Street','A323', 'Essex Road (A43)', 'M43','Orange street','M4','B2045','New Street')
road <- rep("Minor", length(tmp))
m <- regexpr("\\b[ABM]\\d+", tmp)
road[m!=-1] <- substr(regmatches(tmp, m),1,1)
paste(road, "Road")
We use regmatches()
and regexpr()
to find and extract A,B, or M followed by one more more letters
Upvotes: 5