Rich
Rich

Reputation: 1779

Extract text from HTML with Javascript regex

I am trying to parse a webpage and to get the number reference after <li>YM#. For example I need to get 1234-234234 in a variable from the HTML that contains

<li>YM# 1234-234234         </li>

Many thanks for your help someone!

Rich

Upvotes: 1

Views: 357

Answers (3)

Cylian
Cylian

Reputation: 11181

Try this:
(<li>[^#<>]*?# *)([\d\-]+)\b
and get the result in $2.

Upvotes: 1

Jasper
Jasper

Reputation: 11908

currently, your regex only matches if there is a single number before the dash and a single number after it. This will let you get one or more numbers in each place instead:

/YM#[0-9]+-[0-9]+/g

Then, you also need to capture it, so we use a cgroup to captue it:

/YM#([0-9]+-[0-9]+)/g

Then we need to refer to the capture group again, so we use the following code instead of the String.match

var regex = /YM#([0-9]+-[0-9]+)/g;
var match = regex.exec(text);
var id = match[1];
 // 0: match of entire regex
 // after that, each of the groups gets a number

Upvotes: 1

Jack
Jack

Reputation: 5768

(?!<li>YM#\s)([\d-]+)

http://regexr.com?30ng5

This will match the numbers.

Upvotes: 1

Related Questions