Morphan
Morphan

Reputation: 41

Why does IMPORTXML with XPATH return unexpected blank row in addition to expected result?

I'm importing into Google Sheets with IMPORTXML with the following XPATH:

=IMPORTXML(A2;"//*[@id='mw-content-text']/div/table[1]/tbody/tr[4]/td[1]/ul/li")

A2 containing the URL (https://stt.wiki/wiki/20th_Century_Pistol).

From the website I want to import the list entries in the "Basic" column and "Crafted From" row of the table.

There are only two list entries in this section of the table:

Therefore, I expected to get only those two list entries as rows in my sheet. Instead, I got an additional blank row above those two entries. When I change "td[1]" to "td[3]" in the XPATH query however, there are no extra blanks.

I don't understand where the additional blank row is coming from and how I can avoid it.

Google Sheet with desired and actual result

Upvotes: 0

Views: 1060

Answers (2)

Tanaike
Tanaike

Reputation: 201338

When I saw the HTML of the URL, there are 2 li tags in the ul tag. So I think that your xpath is correct. But from your issue, I was worry that the sup tag might affect to this situation. But I'm not sure whether this is the direct reason. So I would like to propose to add the attribute of li for your xpath as follows.

Modified xpath:

When your xpath is modified, please modify as follows.

From:
//*[@id='mw-content-text']/div/table[1]/tbody/tr[4]/td[1]/ul/li
To:
//*[@id='mw-content-text']/div/table[1]/tbody/tr[4]/td[1]/ul/li[@style='white-space:nowrap']
  • By adding [@style='white-space:nowrap'], the value of li with style='white-space:nowrap' is retrieved.

Result:

The formula is =IMPORTXML(A1;"//*[@id='mw-content-text']/div/table[1]/tbody/tr[4]/td[1]/ul/li[@style='white-space:nowrap']"). Please put the URL to the cell "A1".

enter image description here

Note:

  • Also, you can use the xpath of //*[@id='mw-content-text']/div/table[1]/tbody/tr[4]/td[1]/ul/li[position()>1].

Upvotes: 2

E.Wiest
E.Wiest

Reputation: 5905

To complete the very neat @Tanaike's answer, another expression :

=IMPORTXML(A2;"//th[contains(.,'Crafted')]/following::td[1]//li[contains(@style,'white')]")

If a blank line is added it's because GoogleSheets parses an additional blank li element containing a @style attribute.

Upvotes: 1

Related Questions