Reputation: 154
I'm wanting to obtain Club activities from Strava. I was originally looking at using the api & C# (cos that's what I know), but owing to deficiencies in the information provided by the api, I've turned to the technique here (https://twitter.com/OleksMaistrenko/status/1252251408495190018). This has been a fantastic resource and has got me 90% of the way there. I'm now trying to get some more information out of the HTML & being a complete Python/lxml newbie, I can't see how to do it.
So, to get the activity pace, this HTML:
<li title="Pace">
"7:46"
<abbr class="unit" title="minutes per mile"> /mi</abbr>
</li>
is scraped by the following code:
activity_pace = activity.xpath(".//li[@title='Pace']")[0].text.strip()
Q1. So how do I scrape this HTML to obtain the activity duration?
<li title="Time">
"56"
<abbr class="unit" title="minute">m</abbr>
" 26"
<abbr class="unit" title="second">s</abbr>
</li>
I tried this & it only fetches the minutes:
activity_time = activity.xpath(".//li[@title='Time']")[0].text
Q2. I'd like to get the activity title (in this instance, 'Morning Run'). Here's the HTML:
<h3 class="entry-title activity-title" str-on="click" str-trackable-
id="ChQIBTIQCIGRyLgMGAEwLDgAQABIARIECgIIBA==" str-type="self">
<div class="entry-type-icon"><span class="app-icon-wrapper "><span class="app-icon icon-run icon-dark
icon-lg"></span></span></div>
<strong>
<a href="/activities/3339847809">Morning Run</a>
</strong>
</h3>
I've worked out that the block can be got at with this:
activity.xpath(".//h3[@class='entry-title activity-title']")[0]
but after that I'm stumped :-(
Upvotes: 1
Views: 834
Reputation: 24940
It's not very elegant, but can be done this way: Let's say that your html looks like this:
activity = """
<doc>
<h3 class="entry-title activity-title" str-on="click" str-trackable-
id="ChQIBTIQCIGRyLgMGAEwLDgAQABIARIECgIIBA==" str-type="self">
<div class="entry-type-icon"><span class="app-icon-wrapper "><span class="app-icon icon-run icon-dark
icon-lg"></span></span></div>
<strong>
<a href="/activities/3339847809">Morning Run</a>
</strong>
</h3>
<li title="Time">
"56"
<abbr class="unit" title="minute">m</abbr>
" 26"
<abbr class="unit" title="second">s</abbr>
</li>
</doc>"""
import lxml.html
doc = lxml.html.fromstring(activity)
sports = doc.xpath("//h3[@class='entry-title activity-title']//a/text()")
duration = doc.xpath('//li[@title="Time"]')
abbrs = doc.xpath('//abbr[@class="unit"]')
for abbr in abbrs:
abbr.text=''
for sport in sports:
print(sport)
for d in dur:
print(d.text_content().strip().replace('\n','').replace(' ','').replace('""',':'))
Output:
Morning Run
"56:26"
Upvotes: 2