Reputation: 61
Today when experimenting with using importXML in Google Sheets, I ran into a problem. I was attempting to import the title header of a USTA Tournament page into the Google Sheet, however, this did not work as it just resulted in the HTML title of the webpage being displayed ('TournamentHome'). Below is the Google Sheet, and the website that is used:
Google Sheet and Function:
=importXML(F2, "//html//body[@id='thebody']//div[@id='content']//div[@id='pagetitle']")
Website and Section of Source Code Being Used
The title that I am trying to extract from the website is TOWPATH 24th ANNUAL THANKSGIVING JR SINGLES.
The link to the website is https://m.tennislink.usta.com/tournamenthome?T=225779
Upvotes: 0
Views: 577
Reputation: 1
=REGEXEXTRACT(QUERY(ARRAY_CONSTRAIN(IMPORTDATA(
"https://m.tennislink.usta.com/tournamenthome?T=225779"), 555, 1),
"where Col1 contains 'escape'"), "\(""(.*)""\)")
unfortunately, that won't be possible the way you trying because the field you attempt to scrape is controlled by JavaScript and Google Sheets can't understand/import JS. you can test this simply by disabling JS for a given link and you will see what exactly can be imported into Google Sheets:
Upvotes: 1
Reputation: 201348
How about this sample formula? In this formula, the title value is directly retrieved from the script before the value is put to #pagetitle
. Please think of this as just one of several answers.
=REGEXEXTRACT(IMPORTXML(A1,"//div[@class='tournament_search']/script"),"escape\(""([\w\s\S]+)""")
When https://m.tennislink.usta.com/TournamentHome/tournament.aspx?T=38079
and https://m.tennislink.usta.com/tournamenthome?T=225779
are put in "A1" and "A2", the results are as follows.
Upvotes: 1