Fandom_Lover
Fandom_Lover

Reputation: 71

Java - Using Regex to pull substrings out of dataset

Given the following dataset, I am trying to find a way to use Regex to pull out city names.

Boston (MA), New York City (NY, CT, NJ)

New York City (NY, CT, NJ), Philadelphia (PA, NJ)

Indianapolis (IN), St. Louis (MO, IL)

St. Louis (MO, IL), Kansas City (MO, KS)

I want the output of the Regex to be:

Boston,New York City

New York City,Philadelphia

Indianapolis,St. Louis

St. Louis,Kansas City

I attempted to pattern match based on two criteria:

(\\w+\\w(?=.())) | (\\w+\\W\\h\\w+(?=.()))

  1. Cities consisting of letters from [a-zA-Z]+ such as Boston or Philadelphia
  2. One-word consisting of additional characters such as periods/additional spaces.

The expression accurately matches the first case. However, for the second case, it only matches the first occurrence of St. Louis.

I also tried the following:

(\\w+ ?\\w(?=.())) | (\\w+\\h\\w+\\h\\w+(?=\\s.()))| (\\w+\\h\\w+(?=\\s.()))

  1. The first covers the same case as listed above - consisting of one-word cities.
  2. The third one manages to cover the case of New York City, however, just as the first one, fails to recognize cases of the same pattern following that.
  3. And the same case as used in the last pattern, which matched St. Louis fails to match, and matched Kansas City instead.

Upvotes: 0

Views: 45

Answers (1)

user1681317
user1681317

Reputation:

This can be user for .Net

(((?<=\())\w{2})|(\w{2}(?=\)))|(\w{2}(?=,))

Upvotes: 0

Related Questions