Reputation: 16751
I want to remove all images from a HTML page (actually tinymce user input) which do not meet certain criteria (class = "int" or class = "ext") and I'm struggeling with the correct approach. That's what I'm doing so far:
hbody = Hpricot(input)
@internal_images = hbody.search("//img[@class='int']")
@external_images = hbody.search("//img[@class='ext']")
But I don't know how to find images where the class has the wrong value (not "int" or "ext").
I also have to loop over the elements to check other attributes which are not standard html (I use them for setting internal values like the DB id, which I set in the attribute dbsrc). Can I access these attributes too and is there a way to remove certain elements (which are in the hpricot search result) when they don't meet my criteria?
Thanks for your help!
Upvotes: 1
Views: 999
Reputation: 176412
Check out the not CSS selector.
(hbody."img:not(.int)")
(hbody."img:not(.ext)")
Unfortunately, it doesn't seem you can concat not expressions. You might want to fetch all img nodes and remove those where the .css selector doesn't include neither .int nor .ext. Additionally, you could use the difference operator to calculate which elements are not part of both collections.
Use the .remove
method to remove nodes or elements: Hpricot Altering documentation.
Upvotes: 2
Reputation: 14195
>> doc = Hpricot.parse('<html><img src="foo" class="int" /><img src="bar" bar="42" /><img src="foobar" class="int"></html>')
=> #<Hpricot::Doc {elem <html> {emptyelem <img class="int" src="foo">} {emptyelem <img src="bar" bar="42">} {emptyelem <img class="int" src="foobar">} </html>}>
>> doc.search("img")[1][:bar]
=> "42"
>> doc.search("img") - doc.search("img.int")
=> [{emptyelem img src"bar" bar"42"}]
Once you have results from search you can use normal array operations. nonstandard attributes are accessible through [].
Upvotes: 3