Reputation:
Okay, here is the deal. When i look at to match clocks from my browser, for example it shows 14.00 but when i pull it with my python bot it gives me -1 clock, for example 13.00, my question is, how can i set python's clock for the region where i connect from? I mean, how can website set it's own clock for python.
Note: My clock is GMT +3 (Istanbul, Turkey) Here is the webpage: hltv.org/matches Here is my codes:
import datetime, requests, time
from bs4 import BeautifulSoup
matchlinks_um = []
r = requests.get('http://hltv.org/matches')
sauce = r.content
soup = BeautifulSoup(sauce, 'lxml')
for links in soup.find(class_="standard-headline", text=(datetime.date.today())).find_parent().find_all(
class_="upcoming-match"):
matchlinks_um.append('https://hltv.org' + links.get('href'))
for x in range(len(matchlinks_um)):
r = requests.get(matchlinks_um[x])
sauce = r.content
soup = BeautifulSoup(sauce, 'lxml')
a = soup.find('div', class_='time').text
print(a)
Btw, if you have any suggestion for the title i can change it.
Upvotes: 2
Views: 581
Reputation: 15376
I suspect that the correct time is renderd by js because if you disable the js in your brwser you'll get the same results as with your python script.
Usually when parsing dynamic content the solution is selenium
or similar clients, but in this case there is a unix timestamp in your tag's attruibutes (data-unix), which we can use to get the correct time.
import datetime
import requests
from bs4 import BeautifulSoup
r = requests.get('http://hltv.org/matches')
sauce = r.text
soup = BeautifulSoup(sauce, 'lxml')
matchlinks_um = []
for links in soup.find(class_="standard-headline", text=(datetime.date.today())).find_parent().find_all(
class_="upcoming-match"):
matchlinks_um.append('https://hltv.org' + links.get('href'))
for link in matchlinks_um:
r = requests.get(link)
soup = BeautifulSoup(r.text, 'lxml')
a = soup.find('div', class_='time')['data-unix']
t = datetime.datetime.fromtimestamp(int(a[:10])).time()
print(t)
Note that t
is a datetime.time
object, but you could easily convert it to a string if you like.
Also when parsing html it's best to use .text
because it holds the decoded content.
But even if the tag had no 'data-unix' attribute, we could still get the correct time by adding one hour to the value of the tag's text with timedelta
. For example:
s = '15:30'
dt = datetime.datetime.strptime(s, '%H:%M') + datetime.timedelta(hours=1)
t = dt.time()
print(t)
#16:30:00
s
is a string with value '15:30' (H:M format), like those we get from the website. When we pass this string to strptime
we get a datetime
object, so now we can add one hour with timedelta
. dt
is a datetime
object with value 1900-01-01 16:30:00
(15:30 + 1 hour). By calling the .time
method we get a datetime.time
object.t
is a datetime.time
object with value 16:30:00
. You could get the hour with t.hour
(integer), or do more calculations or convert it to string or keep it as it is.t
is s
+ 1 hour.About the 'data-unix' attribute, I don't know if it's a standard attribute (first time I see it), so I don't think you'll find it in any other websites.
Upvotes: 2