Reputation: 159
So I am trying to scrape all the available jobs from the following site:
https://www.twitch.tv/jobs/careers/
I've checked the network setting and there is nothing there
so I tried to extract the data using xpath but still nothing.
data = []
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"
}
url = "https://www.twitch.tv/jobs/careers/"
page = requests.get(url, headers=headers)
tree = html.fromstring(page.content)
xpath = './/a/text()'
jobs = tree.xpath(xpath)
for job in jobs:
print(job)
and all that returns is
Job Openings
Early Career
Equity & Inclusion
Blog
twitch.tv
Job Openings
Early Career
Equity & Inclusion
Blog
....................................
....................................
....................................
Stream
Watch
Develop
Advertise
twitch.tv
Jobs
Merch
Brand
TwitchCon
Meetups
News
Press
Bits
Subs
Turbo
Prime
Extensions
Sings
Legal
Help Center
Security
Twitter
Facebook
Instagram
Terms of service
Privacy Policy
Ad Choices
Cookie Policy
Partners
Affiliates
Upvotes: 0
Views: 266
Reputation: 3753
The career website is using svelte
- it works with requests but it's a bit more effort. All your information you want is available in the page, you just got to work for it :-)
All the job information is in a base64 encoded json string.
You'll need to get this <script>
object and parse the big string at the end:
Then you can to use the base64 library to decode it:
import base64
likeThis = base64.b64decode(mystr_encoded).decode('utf-8')
Just so you know what you're looking for, you can copy that base64 string into an online decoder and you'll see the json:
Upvotes: 2
Reputation: 10666
The jobs you're looking for a generated using Javascript. Have a look at this part (a very long Base64 encoded string):
window.svelteSlabs["3f0161678d3ce5563fa00948dfb2245e"] = "eyJfYm9va3Nob3BfbmFtZSI6ImNhcmVlcnM....."
After you Base64 decode it you'll receive a JSON string that will contains everything you need.
Upvotes: 0