Reputation: 3303
Is there any way to use regex or a "like" function for when I don't have the exact element id, but know the general format?
Currently I have
doc.css('table[id="UTA_basic"]//tbody')
but I'd like to find any table with an id like XYZ_basic
, or even any table like _basic
would work.
I'd be open to switching to xpath if needed.
Upvotes: 2
Views: 428
Reputation: 160621
Nokogiri supports the ability to create your own tag matchers for both CSS and XPath selectors.
For css
:
Custom CSS pseudo classes may also be defined. To define custom pseudo classes, create a class and implement the custom pseudo class you want defined. The first argument to the method will be the current matching NodeSet. Any other arguments are ones that you pass in. For example:
node.css('title:regex("\w+")', Class.new {
def regex node_set, regex
node_set.find_all { |node| node['some_attribute'] =~ /#{regex}/ }
end
}.new)
Similarly, for xpath
:
Custom XPath functions may also be defined. To define custom functions create a class and implement the function you want to define. The first argument to the method will be the current matching NodeSet. Any other arguments are ones that you pass in. Note that this class may appear anywhere in the argument list. For example:
node.xpath('.//title[regex(., "\w+")]', Class.new {
def regex node_set, regex
node_set.find_all { |node| node['some_attribute'] =~ /#{regex}/ }
end
}.new)
This ability looks like it'd let you dig into tags and parameters, but I haven't played with it to see how much it'd help.
About doc.css('table[id="UTA_basic"]//tbody')
. That doesn't look like a CSS, but instead looks like an XPath expression, and passing that to css
will confuse Nokogiri. Also, be very sure your HTML being parsed actually has the tbody
tags. Those are rarely used by people generating tables, but browsers love to put them in as they parse the HTML. Viewing HTML source inside a browser will show them, but usually we don't include them in any sort of search since they won't be found in the source.
Upvotes: 3
Reputation: 89639
You can use the xpath function contains
that will check if the id attribute contains the substring "_basic":
doc.xpath('//table[contains(@id, "_basic")]/tbody')
Note:
However this way may give you false positive if, for example, it exists in your document table tags with ids like this _basical
_basic_1
since this function doesn't check the position or the characters after, but only the presence of the substring.
If you really need to be so precise, you can solve this problem by emulating the xpath 2.0 function ends-with
like this:
doc.xpath('//table[substring(@id,string-length(@id)-string-length("_basic")+1)="_basic")]/tbody')
Upvotes: 1