Digital Farmer
Digital Farmer

Reputation: 2107

Import tables where the title contains the word "table" - Google Sheets

On the pages of that site, these tables appear 1 or 2 times on the right side, I would like to know if there is any way to tell IMPORTXML or IMPORTHTML to import the tables containing the word "table" such as: "xxxxxxxxxx table" and "yyyyyyyyyyy table".

Because there are several tables on the page, if explain to the IMPORT which tables I am looking for, it would not need to import all of them.

Example Page:
https://int.soccerway.com/matches/2020/03/03/england/fa-cup/reading-fc/sheffield-united-fc/3221322/

enter image description here enter image description here enter image description here

enter image description here

Obs.: The pages changes the number of tables according to the existing data, so I can't use something fixed like:

=IMPORTHTML("LINK","table",9)
=IMPORTHTML("LINK","table",10)

Sheet link:
https://docs.google.com/spreadsheets/d/1Dmql4y03xaJoh_wU01XoMM4b3ZSpF-biT7-qGXkyt7Y/edit?usp=sharing

Upvotes: 0

Views: 221

Answers (1)

Tanaike
Tanaike

Reputation: 201378

I believe your goal as follows.

  • You want to use IMPORTXML or IMPORTHTML when "Championship table" and "Premier League table" are existing in the HTML data.
  • You want to achieve this using the built-in formulas of Google Spreadsheet.

For this, how about this answer?

In this answer, I would like to propose the following flow.

  1. Retrieve the values of "Championship table" and "Premier League table" from the URL.
  2. Join the retrieved values with "" and compare it with Championship tablePremier League table.
    • When "Championship table" and "Premier League table" are existing, TRUE is returned.
    • From HTML data, I thought that the order of "Championship table" and "Premier League table" is the constant.

The sample formula is as follows. The URL is put to the cell "A1".

=IF(TEXTJOIN("",TRUE,IMPORTXML(A1,"//h2/a"))="Championship tablePremier League table",TRUE,FALSE)
  • In this case, when "Championship table" and "Premier League table" are existing, TRUE is returned.

Sample formula:

When this formula is used, when "Championship table" and "Premier League table" are existing, as a sample, you want to use IMPORTXML(A1,"//table/tr"), it becomes as follows.

=IF(TEXTJOIN("",TRUE,IMPORTXML(A1,"//h2/a"))="Championship tablePremier League table",IMPORTXML(A1,"//table/tr"),"")
  • The URL is put to the cell "A1".
  • In this case, when "Championship table" and "Premier League table" are NOT existing, no values are shown.

Result:

enter image description here

References:

Added:

For your updated question, how about the following answer?

From your question, I thought that when xxxxxxxxxx table and yyyyyyyyyyy table are existing, the 2 tables are existing. When I saw the HTML data, when xxxxxxxxxx table and yyyyyyyyyyy table are existing, it seems that the tables which has leaguetable sortable table as the class value are existing. I thought that this might be used for your situation. So how about the following sample formula?

Sample formula:

=IF(AND(REGEXMATCH(INDEX(IMPORTXML(A1,"//h2/a"),1),"table"),REGEXMATCH(INDEX(IMPORTXML(A1,"//h2/a"),2),"table")),IMPORTXML(A1,"//table[contains(@class,'leaguetable sortable table')]//tr"),"")

Result:

enter image description here

Note:

  • In this case, =IFERROR(IMPORTXML(A1,"//table[contains(@class,'leaguetable sortable table')]//tr"),"") might be able to be used. About this, please test this for the case without the values of xxxxxxxxxx table and yyyyyyyyyyy table.

Upvotes: 2

Related Questions