enrico.bacis
enrico.bacis

Reputation: 31484

Find CSS Path from JSoup Element

Let's say I have this webpage and I'm considering the td element of the table containing the string Doe. Using Google Chrome I can get the CSS Path of that element:

#main > table:nth-child(6) > tbody > tr:nth-child(3) > td:nth-child(3)

Using that as Jsoup CSS Query returns the element I'm considering as you can see here. Is it possible with Jsoup to obtain the above CSS Path from an Element or I have to manually walk the tree to create it?

I know I could use the CSS Query :containsOwn(text) using the own text of the Element, but this could also select other elements, the path instead includes only classes, ids and :nth-child(n).

This would be pretty useful to code a semantic parser in JSoup that will be able to extract similar elements.

Upvotes: 1

Views: 1148

Answers (1)

enrico.bacis
enrico.bacis

Reputation: 31484

Jsoup doesn't seem to provide such a feature out-of-the-box. So I coded it:

public static String getCssPath(Element el) {
    if (el == null)
        return "";

    if (!el.id().isEmpty())
        return "#" + el.id();

    StringBuilder selector = new StringBuilder(el.tagName());
    String classes = StringUtil.join(el.classNames(), ".");
    if (!classes.isEmpty())
        selector.append('.').append(classes);

    if (el.parent() == null)
        return selector.toString();

    selector.insert(0, " > ");
    if (el.parent().select(selector.toString()).size() > 1)
        selector.append(String.format(
                ":nth-child(%d)", el.elementSiblingIndex() + 1));

    return getCssPath(el.parent()) + selector.toString();
}

I also created an issue and a pull request on the Jsoup repository to extend the Element class with that method. Comment them or subscribe if you want it in Jsoup.

UPDATE

My pull request was merged into jsoup version 1.8.1, now the Element class has the method cssSelector which returns the CSS Path that can be used to retrieve the element in a selector:

Get a CSS selector that will uniquely select this element. If the element has an ID, returns #id; otherwise returns the parent (if any) CSS selector, followed by '>', followed by a unique selector for the element (tag.class.class:nth-child(n)).

Upvotes: 1

Related Questions