chethe
chethe

Reputation: 33

Webscraping: output different to original data

I'm trying to webscrape a website displaying a countdown timer (the goal is to eventually make a discord bot that displays the remaining time left on the timer when requested). However when printing the data, the output is different from the original source.

Looking around, I couldn't find a solution to my problem. I'm sure im missing something, but clueless as to what it is (I'm only doing this as a personal project, very little previous experience with python)

import requests
from bs4 import BeautifulSoup

result = requests.get("https://www.wowclassiccountdown.com/")
result.status_code
result.headers

c = result.content

soup = BeautifulSoup(c)

samples = soup.find_all("div", "fusion-digit")
samples[0]

data = {}
for div in samples:
    title = div.string.strip()
    data[title] = div.attrs['class']

    # displays data
    print(data)

I can't tell you what the expected output is as it's always changing, but it clearly should not be all 0. Can someone explain this to me?

Upvotes: 3

Views: 81

Answers (2)

Nazim Kerimbekov
Nazim Kerimbekov

Reputation: 4783

The website that you are trying to scrape is using Javascript for the countdown (try disabling Javascript in your web browser and you will see that the countdown will set itself to 0). Which sadly makes it impossible to scrape using the requests library.

Upvotes: 2

QHarr
QHarr

Reputation: 84465

You can calculate it yourself. The target end datetime for the countdown is in the response from requests. You can grab current datetime and do the difference. I haven't added the hours, minutes to the following but that is easy from the secs

import requests 
from bs4 import BeautifulSoup as bs
import datetime
from dateutil.relativedelta import relativedelta

r = requests.get('https://www.wowclassiccountdown.com/')
soup = bs(r.content, 'lxml')
end = soup.select_one('#fusion-countdown-1')['data-timer']
ends = datetime.datetime.strptime(end, '%Y-%m-%d-%H-%M-%S')
start = datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
starts = datetime.datetime.strptime(start, '%Y-%m-%d-%H-%M-%S')
diff = ends - starts
diff

For me there is also a 9 hour time difference that needs accounting for.

Upvotes: 1

Related Questions