Reputation: 31484
Let's say I have this webpage and I'm considering the td
element of the table containing the string Doe
. Using Google Chrome I can get the CSS Path of that element:
#main > table:nth-child(6) > tbody > tr:nth-child(3) > td:nth-child(3)
Using that as Jsoup CSS Query returns the element I'm considering as you can see here.
Is it possible with Jsoup to obtain the above CSS Path from an Element
or I have to manually walk the tree to create it?
I know I could use the CSS Query :containsOwn(text)
using the own text of the Element
, but this could also select other elements, the path instead includes only classes, ids and :nth-child(n)
.
This would be pretty useful to code a semantic parser in JSoup that will be able to extract similar elements.
Upvotes: 1
Views: 1148
Reputation: 31484
Jsoup doesn't seem to provide such a feature out-of-the-box. So I coded it:
public static String getCssPath(Element el) {
if (el == null)
return "";
if (!el.id().isEmpty())
return "#" + el.id();
StringBuilder selector = new StringBuilder(el.tagName());
String classes = StringUtil.join(el.classNames(), ".");
if (!classes.isEmpty())
selector.append('.').append(classes);
if (el.parent() == null)
return selector.toString();
selector.insert(0, " > ");
if (el.parent().select(selector.toString()).size() > 1)
selector.append(String.format(
":nth-child(%d)", el.elementSiblingIndex() + 1));
return getCssPath(el.parent()) + selector.toString();
}
I also created an issue and a pull request on the Jsoup repository to extend the Element
class with that method. Comment them or subscribe if you want it in Jsoup.
My pull request was merged into jsoup version 1.8.1, now the Element
class has the method cssSelector
which returns the CSS Path that can be used to retrieve the element in a selector:
Get a CSS selector that will uniquely select this element. If the element has an ID, returns #id; otherwise returns the parent (if any) CSS selector, followed by '>', followed by a unique selector for the element (tag.class.class:nth-child(n)).
Upvotes: 1