Vivek Puurkayastha
Vivek Puurkayastha

Reputation: 536

Match values in pairs from Html using RegEx

I need to use Regex only to extract the following output:

Given the following input:

<li>
  <div class="col-3"> Packaged Quantity </div>
  <div class="col-5"> 1 </div>
</li>
<li>
  <div class="col-3"> Width </div>
  <div class="col-5"> 14.7 cm </div>
</li>

So far I have tried using :

(?<=class=\"col-3\">)[^<]+|(?<=class=\"col-5\">)[^<]+

This gives me 4 different matches. But I want two matches, with two groups in each match. I know I could use xpath to do the same, but I am limited to use Regex for some constraints that I won't be able to comment on.

Upvotes: 0

Views: 62

Answers (1)

CertainPerformance
CertainPerformance

Reputation: 370759

You can match the col-3"> at the start, then capture non-< characters for the first group, match </div> followed by non-> characters, and capture non-< characters again for the second group:

col-3">([^<]+)<\/div>[^>]+>([^<]+)

https://regex101.com/r/YAZFvV/1

(that said, if at all possible, it would be better to use a proper HTML parser for this sort of thing)

Upvotes: 1

Related Questions