GregMa
GregMa

Reputation: 740

java find table using jsoup and equivalent xpath

Here is the HTML code:

<table class="textfont" cellspacing="0" cellpadding="0" width="100%" align="center" border="0">
    <tbody>
        <tr>
            <td class="chl" width="20%">Batch ID</td><td class="ctext">d32654464bdb424396f6a91f2af29ecf</td>
        </tr>
        <tr>
            <td class="chl" width="20%">ALM Server</td>
            <td class="ctext"></td>
        </tr>
        <tr>
            <td class="chl" width="20%">ALM Domain/Project</td>
            <td class="ctext">EBUSINESS/STERLING</td>
        </tr>
        <tr>
            <td class="chl" width="20%">TestSet URL</td>
            <td class="ctext">almtestset://<a href="http://localhost.com">localhost</a></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Tests Executed</td>
            <td class="ctext"><b>6</b></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Start Time</td>
            <td class="ctext">08/31/2017 12:20:46 PM</td>
        </tr>
        <tr>
            <td class="chl" width="20%">Finish Time</td>
            <td class="ctext">08/31/2017 02:31:46 PM</td>
        </tr>
        <tr>
            <td class="chl" width="20%">Total Duration</td>
            <td class="ctext"><b>2h 11m </b></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Test Parameters</td>
            <td class="ctext"><b>{&quot;browser&quot;:&quot;chrome&quot;,&quot;browser-version&quot;:&quot;56&quot;,&quot;language&quot;:&quot;english&quot;,&quot;country&quot;:&quot;US&quot;}</b></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Passed</td>
            <td class="ctext" style="color:#269900"><b>0</b></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Failed</td>
            <td class="ctext" style="color:#990000"><b>6</b></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Not Completed</td>
            <td class="ctext" style="color: ##ff8000;"><b>0</b></td>
        </tr>
        <tr>
            <td class="chl" width="20%">Test Pass %</td>
            <td class="ctext" style="color:#990000;font-size:14px"><b>0.0%</b></td>
        </tr>
    </tbody>

And here is the xpath to get the table:

//td[text() = 'TestSet URL']/ancestor::table[1]

How can I get this table using jSoup? I've tried:

tableElements = doc.select("td:contains('TestSet URL')");

to get the child element, but that doesn't work and returns null. I need to find the table and put all the children into a map. Any help would be greatly appreciated!

Upvotes: 0

Views: 700

Answers (2)

glytching
glytching

Reputation: 47895

The following code will parse your table into a map, this code is subject to a few assumptions:

  • This xpath //td[text() = 'TestSet URL']/ancestor::table[1] will find any table which contains the text "TestSet URL" anywhere in its body, this seems a little bit brittle but assuming it is sufficient for you the JSoup code in getTable() is functionally equiavalent to that xpath
  • The code below assumes that every row contains two cells with the first one being the key and the second one being the value, since you want to parse the table content to a map this assumption seems valid
  • The code below throws exceptions if the above assumptions are not met i.e. if the given HTML does not contain a table definition with "TestSet URL" embedded in its body or if there are more than two cells in any row within that table.

If those assumptions are invalid then the internals of getTable and parseTable will change but the general approach will remain valid.

public void parseTable() {
    Document doc = Jsoup.parse(html);

    // declare a holder to contain the 'mapped rows', this is a map based on the assumption that every row represents a discreet key:value pair
    Map<String, String> asMap = new HashMap<>();
    Element table = getTable(doc);

    // now walk though the rows creating a map for each one
    Elements rows = table.select("tr");
    for (int i = 0; i < rows.size(); i++) {
        Element row = rows.get(i);
        Elements cols = row.select("td");

        // expecting this table to consist of key:value pairs where the first cell is the key and the second cell is the value
        if (cols.size() == 2) {
            asMap.put(cols.get(0).text(), cols.get(1).text());
        } else {
            throw new RuntimeException(String.format("Cannot parse the table row: %s to a key:value pair because it contains %s cells!", row.text(), cols.size()));
        }
    }
    System.out.println(asMap);
}

private Element getTable(Document doc) {
    Elements tables = doc.select("table");
    for (int i = 0; i < tables.size(); i++) {
        // this xpath //td[text() = 'TestSet URL']/ancestor::table[1] will find the first table which contains the
        // text "TestSet URL" anywhere in its body
        // this crude evaluation is the JSoup equivalent of that xpath
        if (tables.get(i).text().contains("TestSet URL")) {
            return tables.get(i);
        }
    }
    throw new RuntimeException("Cannot find a table element which contains 'TestSet URL'!");
}

For the HTML posted in your question, the above code will output:

{Finish Time=08/31/2017 02:31:46 PM, Passed=0, Test Parameters={"browser":"chrome","browser-version":"56","language":"english","country":"US"}, TestSet URL=almtestset://localhost, Failed=6, Test Pass %=0.0%, Not Completed=0, Start Time=08/31/2017 12:20:46 PM, Total Duration=2h 11m, Tests Executed=6, ALM Domain/Project=EBUSINESS/STERLING, Batch ID=d32654464bdb424396f6a91f2af29ecf, ALM Server=}    

Upvotes: 1

Eritrean
Eritrean

Reputation: 16498

You have to remove those quotation marks to get the row with the text; just

tableElements = doc.select("td:contains(TestSet URL)");

but note with the above you are only selecting td elements which contain the text "TestSet URL". To select the whole table use

Element table = doc.select("table.textfont").first();

which means select table with class=textfont and to avoid selecting multiple tables which can have the same class value you have to specify which to choose, therefore: first().

To get all the tr elements:

    Elements tableRows = doc.select("table.textfont tr");
    for(Element e: tableRows)
    System.out.println(e);

Upvotes: 0

Related Questions