scotmanDavid
scotmanDavid

Reputation: 159

Scraper returns no results for web scraper python with xpath

So I am trying to scrape all the available jobs from the following site: https://www.twitch.tv/jobs/careers/ I've checked the network setting and there is nothing there network settings

so I tried to extract the data using xpath but still nothing.

    data = []
    headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"
    }
    url = "https://www.twitch.tv/jobs/careers/"
    page = requests.get(url, headers=headers)
    tree = html.fromstring(page.content)
    xpath = './/a/text()'
     
    jobs = tree.xpath(xpath)
    for job in jobs:
        print(job)
        

and all that returns is

Job Openings
Early Career
Equity & Inclusion
Blog
twitch.tv
Job Openings
Early Career
Equity & Inclusion
Blog
....................................
....................................
....................................
Stream
Watch
Develop
Advertise
twitch.tv
Jobs
Merch
Brand
TwitchCon
Meetups
News
Press
Bits
Subs
Turbo
Prime
Extensions
Sings
Legal
Help Center
Security
Twitter
Facebook
Instagram
Terms of service
Privacy Policy
Ad Choices
Cookie Policy
Partners
Affiliates

Upvotes: 0

Views: 266

Answers (2)

RichEdwards
RichEdwards

Reputation: 3753

The career website is using svelte - it works with requests but it's a bit more effort. All your information you want is available in the page, you just got to work for it :-)

All the job information is in a base64 encoded json string.

You'll need to get this <script> object and parse the big string at the end: enter image description here

Then you can to use the base64 library to decode it:

import base64
likeThis = base64.b64decode(mystr_encoded).decode('utf-8')

Just so you know what you're looking for, you can copy that base64 string into an online decoder and you'll see the json: json decoded from base64

Upvotes: 2

gangabass
gangabass

Reputation: 10666

The jobs you're looking for a generated using Javascript. Have a look at this part (a very long Base64 encoded string):

window.svelteSlabs["3f0161678d3ce5563fa00948dfb2245e"] = "eyJfYm9va3Nob3BfbmFtZSI6ImNhcmVlcnM....." 

After you Base64 decode it you'll receive a JSON string that will contains everything you need.

Upvotes: 0

Related Questions