Reputation: 179
my code:
import requests
import re
from bs4 import BeautifulSoup
r = requests.get(
"https://www.traveloka.com/hotel/detail?spec=22-9-2016.24-9-2016.2.1.HOTEL.3000010016588.&nc=1474427752464")
data = r.content
soup = BeautifulSoup(data, "html.parser")
ratingdates = soup.find_all("div", {"class": "reviewDate"})
for i in range(0,10):
print(ratingdates[i].get_text())
Those code will print "Invalid date". How to get the date?
Additional note:
It seems the solution is using selenium or spynner but I don't know how to use it. Moreover I can't install spynner, it always stuck on installing lxml
Upvotes: 1
Views: 75
Reputation: 2088
It's really simple if you use Selenium. Here's a basic example with some explanation:
To install selenium run pip install selenium
from bs4 import BeautifulSoup
from selenium import webdriver
# set webdriver's browser to Firefox
driver = webdriver.Firefox()
#load page in browser
driver.get(
"https://www.traveloka.com/hotel/detail?spec=22-9-2016.24-9-2016.2.1.HOTEL.3000010016588.&nc=1474427752464")
#Wait 5 seconds after page load so dates are loaded
driver.implicitly_wait(5)
#get page's source
data = driver.page_source
#rest is pretty much the same
soup = BeautifulSoup(data, "html.parser")
ratingdates = soup.find_all("div", {"class": "reviewDate"})
#I changed this bit to always print all dates without range issues
for i in ratingdates:
print(i.get_text())
For more on using Selenium take a look at the docs here - http://selenium-python.readthedocs.io/
If you don't want to get Firefox popping up every time you run the script, you could use PhantomJS
- a lightweight headerless browser. After downloading and setting it up you can just change driver = webdriver.Firefox()
to driver = webdriver.PhantomJS()
in the example above.
Upvotes: 1