Reputation: 3905

Regex will not match

This is my string:

<link href="/post?page=4&amp;tags=example" rel="last" title="Last Page">

From there I am trying to obtain the 4 out of that page parameter, using this regular expression:

link href="/post?page=(.*?)&amp;tags=(.*?)" rel="last"

I will then collect the 4 out of the first group, the tags parameter has a wildcard because the contents can change. However, I don't seem to be getting a match with this, can anyone help?

And I know I shouldn't be using regex to parse HTML, but this is just a small thing and it would be a waste to import a huge module for this.

Upvotes: 1

Answers (4)

Niet the Dark Absol

Reputation: 324790

Assuming you are using a /regex literal/, you will need to escape the / in that path as \/.

Alternatively, it depends on how you are getting this string. Is it really typed that way, or is it part of an innerHTML that you are then reading out again? If that's the case, then the innerHTML won't be what you expect it to be, because the browser will "normalise" it.

If it is an innerHTML, then it'd be far easier to get the tag, then get the tag's href attribute, then regex that.

Upvotes: 3

Smileek

Reputation: 2797

link href="/post\?page=(.*?)&tags=(.*?)" rel="last"
You forgot the slash before ?

Upvotes: 1

Lady Serena Kitty

Reputation: 59

I think it might be better to change your capture groups to something a little different, but will catch everything up to the terminating character:

link href="/post?page=([^&]+)&tags=([^\"]+)" rel="last"

Using the negating character first in the character group tells the regex engine "capture all characters EXCEPT the ones listed here". This makes it very easy to capture everything up until it hits a termination character, such as the amperstand and double-quote. Assuming you're using PHP or Java, this should also slightly improve regex performance.

Upvotes: 1

speakr

Reputation: 4209

If the page parameter always comes first, try the PCRE /\?page=(\d+)/. Match group 1 will contain the page number.

Upvotes: 0

Regex will not match

Answers (4)

Related Questions