h.l.m
h.l.m

Reputation: 13475

Regex stringr to get letter next to a number

In R, I would like to use a grep or 'grepl' or 'gsub' command to find all elements in the vector of strings, that have either an A road or M road or a B road name in it..

Please see example below

tmp <- c('Little Street','A323', 'Essex Road (A43)', 'M43','Orange street','M4','B2045','New Street')

And I would like a function to return...

c('Minor Road','A Road', 'A Road', 'M Road', 'Minor Road', 'M Road','B Road','Minor Road')

My first thought was to use something like

grepl('[0-9]',tmp)

but this can't distinguish between the A road, B road and M road....

As always any help would be greatly be appreciated...

Upvotes: 1

Views: 167

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 269764

This can be done in a single strapply statement which returns letter followed by " Road" for each input component having a letter followed by a number. For any non-matched components use "Minor Road":

library(gsubfn)

strapply(tmp, "(\\D)\\d", ~ paste(x, "Road"), empty = "Minor Road", simplify = TRUE)

giving:

[1] "Minor Road" "A Road"     "A Road"     "M Road"     "Minor Road"
[6] "M Road"     "B Road"     "Minor Road"

Update: Simplified answer down to one statement.

Upvotes: 1

Jim
Jim

Reputation: 4767

Using rex may make this type of task a little simpler.

tmp <- c('Little Street','A323', 'Essex Road (A43)', 'M43','Orange street','M4','B2045','New Street')

library(rex)
classify_road <- function(x) {
  res <- re_matches(x,
    rex(
      capture(name = "type",
        upper
      ),
      digit
    )
  )

  res$type[ is.na(res$type) ] <- "Minor"
  paste(res$type, "Road")
}

classify_road(tmp)
#>[1] "Minor Road" "A Road"     "A Road"     "M Road"     "Minor Road"
#>[6] "M Road"     "B Road"     "Minor Road"

Upvotes: 0

hwnd
hwnd

Reputation: 70732

You could break it down into steps using grepl and sub ...

> tmp[!grepl('[ABM]\\d', tmp)] <- 'Minor Road'
> sub('.*([ABM])\\d.*', '\\1 Road', tmp)
# [1] "Minor Road" "A Road"     "A Road"     "M Road"     "Minor Road"
# [6] "M Road"     "B Road"     "Minor Road"

Upvotes: 3

MrFlick
MrFlick

Reputation: 206308

How about this

tmp <- c('Little Street','A323', 'Essex Road (A43)', 'M43','Orange street','M4','B2045','New Street')

road <- rep("Minor", length(tmp))
m <- regexpr("\\b[ABM]\\d+", tmp)
road[m!=-1] <- substr(regmatches(tmp, m),1,1)
paste(road, "Road")

We use regmatches() and regexpr() to find and extract A,B, or M followed by one more more letters

Upvotes: 5

Related Questions