Using Python & lxml to web scrape Strava

Question

I'm wanting to obtain Club activities from Strava. I was originally looking at using the api & C# (cos that's what I know), but owing to deficiencies in the information provided by the api, I've turned to the technique here (https://twitter.com/OleksMaistrenko/status/1252251408495190018). This has been a fantastic resource and has got me 90% of the way there. I'm now trying to get some more information out of the HTML & being a complete Python/lxml newbie, I can't see how to do it.

So, to get the activity pace, this HTML:

   
      "7:46"
       /mi

is scraped by the following code:

activity_pace = activity.xpath(".//li[@title='Pace']")[0].text.strip()

Q1. So how do I scrape this HTML to obtain the activity duration?


   "56"
   m
    " 26"
   s

I tried this & it only fetches the minutes:

activity_time = activity.xpath(".//li[@title='Time']")[0].text

Q2. I'd like to get the activity title (in this instance, 'Morning Run'). Here's the HTML:


  
  
  Morning Run

I've worked out that the block can be got at with this:

activity.xpath(".//h3[@class='entry-title activity-title']")[0]

but after that I'm stumped :-(

Jack Fleeting · Accepted Answer

It's not very elegant, but can be done this way: Let's say that your html looks like this:

activity = """

  
  
  
  Morning Run
  


   "56"
   m
    " 26"
   s

"""

import lxml.html
doc = lxml.html.fromstring(activity)

sports = doc.xpath("//h3[@class='entry-title activity-title']//a/text()")
duration = doc.xpath('//li[@title="Time"]')
abbrs = doc.xpath('//abbr[@class="unit"]')

for abbr in abbrs:
    abbr.text=''
for sport in sports:
    print(sport)
for d in dur:
    print(d.text_content().strip().replace('
','').replace(' ','').replace('""',':'))

Output:

Morning Run
"56:26"

Using Python & lxml to web scrape Strava

Answers (1)

Related Questions

Using Python &amp; lxml to web scrape Strava

Answers (1)

Related Questions

Using Python & lxml to web scrape Strava