eirever
eirever

Reputation: 143

how to use IMPORTHTML for an array of urls

Based on responses to a previous question, it seems that it is not possible to import all rows of an HTML chart from a website that uses javascript to auto expand when a user scrolls.

The answer to the first version of this question resulted in using an array of IMPORTHTML for the range of a QUERY function. As this worked to return results, I adjusted the urls so the tables would populate more as expected; this meant having far more urls. I decided to build the urls with an ARRAYFORMULA on the ref! tab (link below).

I then individually updated the rows of the QUERY range array to produce the following formula

=query( 
  {importhtml(ref!B5, ref!B1, ref!B2); 
    importhtml(ref!B6, ref!B1, ref!B2);
    importhtml(ref!B7, ref!B1, ref!B2);
    importhtml(ref!B8, ref!B1, ref!B2);
    importhtml(ref!B9, ref!B1, ref!B2);
    importhtml(ref!B10, ref!B1, ref!B2);
    importhtml(ref!B11, ref!B1, ref!B2);
    importhtml(ref!B12, ref!B1, ref!B2);
    importhtml(ref!B13, ref!B1, ref!B2);
    importhtml(ref!B14, ref!B1, ref!B2);
    importhtml(ref!B15, ref!B1, ref!B2);
    importhtml(ref!B16, ref!B1, ref!B2) 
    }, 
  "where Col1 is not null", 1)

UPDATED QUESTION: Is there a more elegant solution for building the QUERY range array?

Some of the urls created in the ref! tab are empty but could come into use. The hope is that there might be a way to build the function so that it has a level of "future proofing" for potential expansion. While a function in cell is the desired solution, if this requires Apps Script (adding tag), I am open to strategies that might achieve the end goal.

https://docs.google.com/spreadsheets/d/14jHGRyHKf866jrZiIfX2-GX6hZ2SE6vi_DezckpXaRs/edit#gid=7490361

Any guidance on how to do this successfully (and possibly elegantly) is greatly appreciated.


Text from previous versions of this question included below.

As a workaround, ~~I pulled the individual urls from each page of the chart with the settings to the maximum rows displayed at a time (i.e., 100). I have placed all the urls in a sheet and want to populate a single the IMPORTHTML results using a single formula. The following formulas do not result in the desired outcome.~~

=IMPORTHTML({D3:D6},A3,B3)
=ARRAYFORMULA(IMPORTHTML({D3:D9},A3,B3))

Both of these options result in a partial return of only the first url.

={{IMPORTHTML(D3,A3,B3)};{IMPORTHTML(D4,A3,B3)};{IMPORTHTML(D5,A3,B3)};{IMPORTHTML(D6,A3,B3)}}

This option results in what appears to be a full return of the first url with some odd filler rows as well as a partial return from the second url. This is the option currently on the arrayImport tab.

Upvotes: 0

Views: 698

Answers (1)

doubleunary
doubleunary

Reputation: 18784

To weed out repeated header rows, try this:

=query( 
  { 
    importhtml(D3, A3, B3); 
    importhtml(D4, A3, B3); 
    importhtml(D5, A3, B3); 
    importhtml(D6, A3, B3) 
  }, 
  "where Col1 is not null", 
  1 
)

To import data from a list of addresses where the number of URLs changes, you can try the importDataFromManySpreadsheets script.

Upvotes: 1

Related Questions