Vincent L
Vincent L

Reputation: 739

Regex to extract text between brackets

I need to extract the number 5 in the brackets in this HTML code:

<td class="th-clr-cel th-clr-td th-clr-pad th-clr-cel-dis" style="width:226px; text-align:left; ">
<span class="th-tx th-tx-value th-tx-nowrap"  style="width:100&#x25;; "  title="Social&#x20;Insurance&#x20;Number&#x20;&#x28;SIN&#x29;" id="C29_W120_V121_builidnumber_table[5].type_text" f2="C;40">
    Social&#x20;Insurance&#x20;Number&#x20;&#x28;SIN&#x29;
</span>

This is just an extract of the whole HTML code and there is much more actual code before and after this sample. But one thing is for sure, the word "Insurance" only appears in this sample.

I managed to match whatever is between the 2 instances of "Social Insurance Number" with this regex:

((?<=Social&#x20;Insurance&#x20;Number)(.*)(?=Social&#x20;Insurance&#x20;Number))

Now I need to combine that and extract the number 5 within the square brackets.

Please note: the content of the bracket could be multiple chracters (i.e.: 15), but it will always be a numeral.

Thank you

EDIT: The reason I want to use regex to parse HTML is because this is part of a JMeter script to run mass performance tests on a system with hundreds of concurrent users. Performance is a major factor here and an XML parser will consume more resources than regex.

Upvotes: 0

Views: 1627

Answers (3)

hitesh bedre
hitesh bedre

Reputation: 589

This will capture exactly digits under square brackets surrounded by term Insurance:

Insurance(?:[\s\S]*)\[(\d+)\](?:[\s\S]*)Insurance

Demo: https://regex101.com/r/hwFB0Y/3

Upvotes: 2

Niel Godfrey P. Ponciano
Niel Godfrey P. Ponciano

Reputation: 10719

Try this:

Insurance.*\[(\d+)\]

Or if you want to match it between the 2x "Insurance" words

Insurance.*\[(\d+)\][\s\S]+?Insurance

Demo here.

Where

  • Insurance - Match the starting word "Insurance"
  • .* - Match any character
  • \[ - Match the opening bracket
  • (\d+) - Capture the numerical value inside brackets
  • \] - Match the closing bracket
  • [\s\S]+? - Match any character (including newlines) in a non-greedy way so that it wouldn't span across multiple "Insurance" words
  • Insurance - Match the ending word "Insurance"

Upvotes: 1

Gonnen Daube
Gonnen Daube

Reputation: 317

Is that what you're looking for?

((?<=Social&#x20;Insurance&#x20;Number.*\[)(\d+)(?=\].*Social&#x20;Insurance&#x20;Number))

Upvotes: 1

Related Questions