Erba Aitbayev
Erba Aitbayev

Reputation: 4333

How to extract all rows with concatenated cells from a table using Xpath?

I have an html table:

<table class="info">
<tbody>
    <tr><td class="name">Year</td><td>2011</td></tr>
    <tr><td class="name">Storey</td><td>3</td></tr>
    <tr><td class="name">Area</td><td>170</td></tr>
    <tr><td class="name">Condition</td><td>Renovated</td></tr>
    <tr><td class="name">Bathroom</td><td>2</td></tr>
</tbody>
</table>

In this table data is organized in such way that each row contains 2 cells enclosed in <td> tags. First cell contains information about data type. For example year of building of house. Second cell contains year information itself which is 2011.

I want to extract data in such way that data type and information are divided and corresponded to each other. I want to extract data type and information this way:

Year - 2011
Storey - 3
Area - 170
Condition - Renovated
Bathroom - 2

For now I am using Xpath's concatenation function concat. Here is my Xpath expression:

concat(//table[@class="info"]//tr//td[contains(@class, 'name')]/text()  , ' - ', //table[@class="info"]//tr//td[not(contains(@class, 'name'))]/text())

This XPath returns this result:

Year - 2011

My table contains 5 rows. My Xpath expression returned only 1st row with concatenated cells.

But 2 Xpath expressions that I send to concat function individually return the normal result with all rows.

These are the 2 XPath expressions:

//table[@class="info"]//tr//td[contains(@class, 'name')]/text()

and

//table[@class="info"]//tr//td[not(contains(@class, 'name'))]/text()

Both of this expressions return all rows with required information. When I send this two expressions to concat function, it returns only the 1st row.

How to get all rows with concatenated cells using Xpath? I guess it is not possible using Xpath only. Do I have to do it with the help of some programming language such as PHP or may be new version of Xpath or some sophisticated expressions can help me in this case?

Upvotes: 1

Views: 568

Answers (1)

If you use java:

1 get a Dom document

2 loop

  int i=1;
  while (true)
  {
  if (xpath.compile("//tr["+i+"]").evaluate(document,XPathConstants.NODE) ==null) break;

  expr = xpath.compile("concat (//tr["+i+"]/td[@class='name']/text(),' - ',//tr["+i+"]/td[not(@class='name')]/text())");
  resX= (String) expr.evaluate(document, XPathConstants.STRING);
  System.out.println(resX);
  i++;
  }

Another option:

get every tr

expression="//table[@class=\"info\"]//tr";
XPathExpression expr = xpath.compile(expression) ; 
NodeList nodes  = (NodeList) expr.evaluate(document, XPathConstants.NODESET);

and inside

  for (int temp1 = 0; temp1 < nodes.getLength(); temp1++) {
      Node nodeSegment = nodes.item(temp1);
      if (nodeSegment.getNodeType() == Node.ELEMENT_NODE) {
      ...
      expr = xpath.compile("concat (td[@class='name']/text(),' - ',td[not(@class='name')]/text())");
      resX= (String) expr.evaluate(eElement, XPathConstants.STRING);
      System.out.println(resX);

Upvotes: 2

Related Questions