luckasx
luckasx

Reputation: 369

Jsoup Selector Regex matching

I want to get just the elements with this id pattern "answer-[0-9]*"

I'm using this regex in select "div[id~=answer-[0-9]*]"

The matching elements are:

<div class="post-text" id="answer-45881">

and

<div class="hidden modal modal-flag" id="answer-flag-modal45881">

What must I change to get only the first one?

Upvotes: 2

Views: 6034

Answers (3)

Pshemo
Pshemo

Reputation: 124275

Based on example from official tutorial

[attr~=regex]: elements with attribute values that match the regular expression; 
e.g. img[src~=(?i)\.(png|jpe?g)]

it looks like jsoup simply checks if attribute contains some part which can be matched with regex (like in this example .png or .jpg), not if entire value of attribute is matched by regex.

To check if regex matches entire string you need to place anchors representing start of the string ^ and end of the string $.

Also instead of * you probably should use + if you want to make number part mandatory.

So try with div[id~=^answer-[0-9]+$]

Upvotes: 4

hwnd
hwnd

Reputation: 70732

The * operator means "zero or more" times so it will still match the second example. You need to use the + operator instead meaning "one or more" times. So, your syntax would be:

div[id~=answer-[0-9]+]

Upvotes: 2

Maksim
Maksim

Reputation: 264

It looks like it searches id to contain this pattern, not to match.

"div[id~=answer-[0-9]*$]"

should work then.

Upvotes: 1

Related Questions