Regular Expression to extract city state and country from html

Question

I am using Outwit hub to scrape a website for city, state, and country (USA and Canada Only). With the program I can use regular expressions to define the markers Before and After the text I wish to grab. I can also define a format for the desired text.

Here is a sample of the html:

                        

BILLINGS, MT
USA

I have set up my reg.ex. as follows:

CITY - Before (not formated as regex)

The issue arrises when there is no city or state listed. I have tried to account for this, but am just making it worse. Is there any way this can be cleaned up and still account for the possibility of missing info? Thank you.

Example with no city:

Example with no city / state: (yes, there is an extra line break)

USA

Thank you for any help you can provide.

aljo · Accepted Answer

Here is what you can do if you have the pro version:

Description: Data
Before: 
After: 
Format: (([\w \-]+),)? ?([A-Z]{2})?[
](USA|canada)\s*
Replace: \2##\3##\4
Separator: ##
Labels: City,State,Country

If you are using the light version, you have to do it in three lines:

Description: City
Before: 
After: ,
Format: [^<>]+

Description: State
Before: /[
]([^<>
 ]+,)?/
After: /[
]/
Format: [A-Z]{2}

Description: Country
Before:
After: 
Format: (USA|canada)

Regular Expression to extract city state and country from html

Answers (2)

Related Questions