Reputation: 3
Is there a way to include a pattern in the search but then not include it in the final output?
I'm trying to find a way to take out just the state in the source code of an address. So my input is
<strong class="street-address">
<address itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">10937 W Pico Blvd</span><br>
<span itemprop="addressLocality">Los Angeles</span>,
<span itemprop="addressRegion">CA</span>
<span itemprop="postalCode">90064</span>
</address>
</strong>
(the actual source code is much longer for the page) but I want to look up using Regex:
postalCode">[0-9]{5}
and then only take out the snippet of [0-9]{5}
instead of the postalCode">
part in the beginning. The issue comes up when I have to search the whole source code as there is inevitably going to be other 5 digit numbers in the entire source code somewhere. Anyway to say, "look for postalCode">" and then take the next 5 digits if they fit the pattern [0-9] for 5 digits?
Upvotes: 0
Views: 133
Reputation: 425198
Use a look behind:
(?<="postalCode">)\d{5}
Look behinds, which have the syntax (?<=...)
, assert, but do not capture, the input that immediately precedes the matched input. The match returned would be just the 5 digits.
Upvotes: 1