user5743273
user5743273

Reputation:

Website's clock for Python

Okay, here is the deal. When i look at to match clocks from my browser, for example it shows 14.00 but when i pull it with my python bot it gives me -1 clock, for example 13.00, my question is, how can i set python's clock for the region where i connect from? I mean, how can website set it's own clock for python.

Note: My clock is GMT +3 (Istanbul, Turkey) Here is the webpage: hltv.org/matches Here is my codes:

import datetime, requests, time
from bs4 import BeautifulSoup

matchlinks_um = []

r = requests.get('http://hltv.org/matches')
sauce = r.content
soup = BeautifulSoup(sauce, 'lxml')

for links in soup.find(class_="standard-headline", text=(datetime.date.today())).find_parent().find_all(
        class_="upcoming-match"):
    matchlinks_um.append('https://hltv.org' + links.get('href'))

for x in range(len(matchlinks_um)):
    r = requests.get(matchlinks_um[x])
    sauce = r.content
    soup = BeautifulSoup(sauce, 'lxml')

    a = soup.find('div', class_='time').text
    print(a)

Btw, if you have any suggestion for the title i can change it.

Upvotes: 2

Views: 581

Answers (1)

t.m.adam
t.m.adam

Reputation: 15376

I suspect that the correct time is renderd by js because if you disable the js in your brwser you'll get the same results as with your python script.
Usually when parsing dynamic content the solution is selenium or similar clients, but in this case there is a unix timestamp in your tag's attruibutes (data-unix), which we can use to get the correct time.

import datetime
import requests
from bs4 import BeautifulSoup

r = requests.get('http://hltv.org/matches')
sauce = r.text
soup = BeautifulSoup(sauce, 'lxml')

matchlinks_um = []

for links in soup.find(class_="standard-headline", text=(datetime.date.today())).find_parent().find_all(
        class_="upcoming-match"):
    matchlinks_um.append('https://hltv.org' + links.get('href'))

for link in matchlinks_um:
    r = requests.get(link)
    soup = BeautifulSoup(r.text, 'lxml')
    a = soup.find('div', class_='time')['data-unix']
    t = datetime.datetime.fromtimestamp(int(a[:10])).time()
    print(t)

Note that t is a datetime.time object, but you could easily convert it to a string if you like.
Also when parsing html it's best to use .text because it holds the decoded content.


But even if the tag had no 'data-unix' attribute, we could still get the correct time by adding one hour to the value of the tag's text with timedelta. For example:

s = '15:30'
dt = datetime.datetime.strptime(s, '%H:%M') + datetime.timedelta(hours=1)
t = dt.time()

print(t)
#16:30:00
  • s is a string with value '15:30' (H:M format), like those we get from the website. When we pass this string to strptime we get a datetime object, so now we can add one hour with timedelta.
  • dt is a datetime object with value 1900-01-01 16:30:00 (15:30 + 1 hour). By calling the .time method we get a datetime.time object.
  • t is a datetime.time object with value 16:30:00. You could get the hour with t.hour (integer), or do more calculations or convert it to string or keep it as it is.
    The pont is that t is s + 1 hour.

About the 'data-unix' attribute, I don't know if it's a standard attribute (first time I see it), so I don't think you'll find it in any other websites.

Upvotes: 2

Related Questions