Reputation: 3
So I am trying to use Xpath to import information from the site into a google spreadsheet and I am struggling a little with <div>
.
<div class="value-display__value">
<div title="" data-html-title="">
#295
</div>
</div>
The idea is to import '#295', and here is my code.
IMPORTXML($C2,"//div[@class='value-display__value']//div/text()")
google sheets seems to import empty content instead of 295
Upvotes: 0
Views: 987
Reputation: 4988
It looks like the ranking number is populated at runtime, so you can't just use ImportXML()
alone. The page also makes no XHR's after loading, which tells me that the data is somewhere on the page already. This gives you two options:
Request an API key and use the get_user
API. More info here and on their wiki. They say that their API is going to be going to v2 soon, though, so watch out for that if you want a long-term solution.
Scrape the static HTML file for the right data, since that's where the data resides. I'm assuming you're after the player's rank, so we can do this in two stages:
This scrapes the URL you gave in the comments. I found the ranking data in a script tag with id json-user
, so I used.
=IMPORTXML("https://osu.ppy.sh/users/4504101","//script[@id='json-user']")
Then use a REGEXEXTRACT()
to find the data you want. The part of the string we're interested in is: "rank":{"global":1,"country":1}}
, so I did
=REGEXEXTRACT(A1,"""rank"":{""global"":(\d+),""country"":(\d+)")
The parens around (\d+)
create capture groups for the numbers. This yields two cells: the first is for global and the second is for country rank. If you're just interested in the country rank, you can leave the parens off of the first \d+
.
Upvotes: 1