Larry
Larry

Reputation: 3

Regex: How to search for a pattern but not include it in the output

Is there a way to include a pattern in the search but then not include it in the final output?

I'm trying to find a way to take out just the state in the source code of an address. So my input is

<strong class="street-address">
 <address itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">10937 W Pico Blvd</span><br>
<span itemprop="addressLocality">Los Angeles</span>, 
<span itemprop="addressRegion">CA</span> 
<span itemprop="postalCode">90064</span>
        </address>

        </strong>

(the actual source code is much longer for the page) but I want to look up using Regex:

postalCode">[0-9]{5} 

and then only take out the snippet of [0-9]{5} instead of the postalCode"> part in the beginning. The issue comes up when I have to search the whole source code as there is inevitably going to be other 5 digit numbers in the entire source code somewhere. Anyway to say, "look for postalCode">" and then take the next 5 digits if they fit the pattern [0-9] for 5 digits?

Upvotes: 0

Views: 133

Answers (1)

Bohemian
Bohemian

Reputation: 425198

Use a look behind:

(?<="postalCode">)\d{5}

Look behinds, which have the syntax (?<=...), assert, but do not capture, the input that immediately precedes the matched input. The match returned would be just the 5 digits.

Upvotes: 1

Related Questions