Reputation: 21
I am trying to read a table at an URL using pandas read_html, but the table I am interested in is loaded after the other parts of the page, so the dataframe I get is like below instead of the actual content:
ColumnA | ColumnB
Still loading | Still loading
So is there a way to tell read_html to wait until the table is loaded completely and then read the table?
Upvotes: 1
Views: 1178
Reputation: 2409
There's no way we can answer for sure without a specific code example, but you should be aware that read_html
crawls the static version of the HTML as it is served; it doesn't wait for JavaScript to execute (likely what you're seeing happen in the browser when the table "loads") because the HTML crawler doesn't execute JavaScript at all.
You can also read more about common HTML-scraping gotchas with pandas here, though these will be more relevant for performance rather than waiting for a secondary page update.
If you need to incorporate javascript updates into your crawl, you may need to look into a headless browser like Selenium [docs] or headless-chrome [related question].
Upvotes: 1