sarasreddy74
sarasreddy74

Reputation: 55

How do I match this pattern in R

I have to match only the first Country name in the pattern below. The country names are given in all upper case letters. I used the following code to get the matches but it matches all the countries.

'\\b[A-Z]{2,}.\\b'

Eg: In the pattern below, I just want UNITED KINGDOM

x = "~ London, Greater London ~ UNITED KINGDOM;~ Ottawa, Ontario ~ CANADA;~,~ AUSTRALIA;~,~ POLAND;~,~ USA"

Upvotes: 0

Views: 65

Answers (1)

Frank
Frank

Reputation: 66819

This seems to work:

regmatches(x, regexpr('\\b[A-Z ]{2,}\\b', x))
# [1] "UNITED KINGDOM"

I just added a space to make the character set [A-Z ]. Note that regexpr gets the first match while gregexpr gets all of them (similar to sub vs gsub).

For more info, I recommend the official docs at ?regexpr.

Upvotes: 2

Related Questions