Reputation: 380
I run this web-scraper on my notebook - it uses Firefox (selenium - webdriver) to get the data - it must actually open the Firefox because the data are created by JavaScript. So I wonder if dedicated server can open Firefox and get the data too - I think dedicated servers have no display so it will not work? The script is much more complicated (well 152 lines) - I pasted only the parts which I think will not work. I believe importing the data into PostgreSQL is no problem in dedicated server.
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import lxml
import re
import psycopg2
import sys
driver = webdriver.Firefox()
driver.set_window_position(-9999, -9999)
driver.get("http://rodos.vsb.cz/Road.aspx?road=D2")
time.sleep(20) #waits till the page loads
html_source = driver.page_source
soup = BeautifulSoup(html_source, 'lxml')
# finds tags with speed information (km/h)
for i in (soup.find_all("tspan", {"id" : re.compile("tspan_Label_\w*")})):
if re.match("^[0-9]+$", (str(i.getText()))) is not None:
if (str(i.parent.get('fill'))) == '#5f5f5f':
list1.append(i.getText())
Upvotes: 0
Views: 137
Reputation: 2796
I think what you might be looking for is pyvirtualdisplay:
pip install pyvirtualdisplay
pyvirtualdisplay will emulate the browser of your choice in memory without actually opening a browser.
from pyvirtualdisplay import Display
from selenium import webdriver
# Set screen resolution to 1366 x 768 like most 15" laptops
display = Display(visible=0, size=(1366, 768))
display.start()
# now Firefox will run in a virtual display.
browser = webdriver.Firefox()
# Sets the width and height of the current window
browser.set_window_size(1366, 768)
# Open the URL
browser.get('http://rodos.vsb.cz/Road.aspx?road=D2')
# set timeouts
browser.set_script_timeout(30)
browser.set_page_load_timeout(30) # seconds
time.sleep(20) #waits till the page loads
html_source = driver.page_source
soup = BeautifulSoup(html_source, 'lxml')
# finds tags with speed information (km/h)
for i in (soup.find_all("tspan", {"id" : re.compile("tspan_Label_\w*")})):
if re.match("^[0-9]+$", (str(i.getText()))) is not None:
if (str(i.parent.get('fill'))) == '#5f5f5f':
list1.append(i.getText())
Upvotes: 1