user1946217
user1946217

Reputation: 1753

regex for extracting only alphabets and numbers in a string in R

Hi I need a regex which extracts numbers and (numbers + alphabets) if present in a string.

Ex: "4596 2B FC JAIN BHAWAN" --> I want "4596 2B" as my output

> gsub("\\S([a-zA-Z])+\\S", "", "4596 2B FC JAIN BHAWAN")
[1] "4596 2B FC  "

I do not understand why the above regex did not replace FC with ""

Any help is appreciated. Thanks

Upvotes: 2

Views: 1373

Answers (1)

Hugh
Hugh

Reputation: 16090

You are using \\S (capital) which means "not a space", use the lower case, and only use it once (because the end of your string doesn't terminate with a space):

gsub("\\s([a-zA-Z])+", "", "4596 2B FC JAIN BHAWAN")

Using Simon's suggestion allows us to see the woods for the trees:

gsub("\\b[a-zA-Z]+\\b", "", "aa 4592 2B FC JAIN BHAWAN")
[1] " 4592 2B"

though I might need some help to get rid of the initial space. (I could just put nested gsubs but that seems cheating.)

Upvotes: 5

Related Questions