Antti
Antti

Reputation: 1293

How to erase all non-letter characters before first letter (R vector of character strings)

I have a vector of character strings:

 cities <- c("London", "001 London", "Stockholm", "002 Stockholm")

I need to erase anything in each string that precedes first letter so that I would have:

 cities <- c("London", "London", "Stockholm", "Stockholm")

I've tried e.g. this

 cities <- sub("^.*?[a-zA-Z]", "", cities)

but that erases the first letter too, which I don't want to happen.

Upvotes: 0

Views: 1099

Answers (3)

Shenglin Chen
Shenglin Chen

Reputation: 4554

Delete number:

 gsub('\\d+','',cities)
 [1] "London"     " London"    "Stockholm"  " Stockholm"

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626871

Use

cities <- c("London", "001 London", "Stockholm", "002 Stockholm")
gsub("^\\P{L}*", "", cities, perl=T)

See IDEONE demo

The ^\\P{L}* regex means:

  • ^ - Assert the beginning of the string
  • \\P{L}* - 0 or more characters other than a letter.

This solution is preferable if you have city names starting with Unicode letters.

Upvotes: 3

Avinash Raj
Avinash Raj

Reputation: 174706

Use a negated character class to match all the non-alphabetic characters which exists at the start.

cities <- sub("^[^a-zA-Z]*", "", cities)

or

Use capturing group to capture the first letter character.

cities <- sub("^.*?([a-zA-Z])", "\\1", cities)

Upvotes: 3

Related Questions