nidabdella
nidabdella

Reputation: 821

Running Selenium webdriver from a Python CGI script

I created a python script that uses Selenium webdriver to scrap a website. Now I'm trying to run this script from the web using CGI. So to ensure that my CGI server is working I tried this :

import cgi
print 'Content-Type: text/html'
print
list_brand = ['VOLVO','FIAT', 'BMW']
print '<h1>TESTING CGI</h1>'
print '<form>'
print '<select>'
for i in range(3):
      print '<option value="' + list_brand[i] + '">'+ list_brand[i] +'</option>'
print '</select>'
print '</form>'

And it worked fine. Now, When I use Selenium with CGI using this script:

import cgitb
import cgi
from selenium import webdriver

print 'Content-Type: text/html'
print
cgitb.enable(display=0, logdir="C:/path/to/log/directory")
path_to_pjs = 'C:path/to/phantomjs-2.1.1-windows/bin/phantomjs.exe'
browser = webdriver.PhantomJS(executable_path = path_to_pjs)
#Reaching to URL
url = 'http://www.website.fr/cl/2/products'
browser.get(url)
div_set = browser.find_elements_by_class_name('productname')
print '<form>'
print '<select>'
for div in div_set:
      print '<option value="' + div.find_element_vy_tag_name('h3').text + '">'+ div.find_element_vy_tag_name('h3').text +'</option>'
print '</select>'
print '</form>'

the page keeps loading but doesn't respond. Any idea if this is even possible (I mean running selenium from a cgi script) or why my server doesn't respond ?

Upvotes: 1

Views: 634

Answers (2)

Rublacava
Rublacava

Reputation: 519

That may have worked in 2017, but in 2024, Apache HTTP Server doesn't let CGI/www-data import selenium. With this CGI script

#!/usr/bin/env python3
import cgi
#from selenium import webdriver
#import selenium
print("Content-type: text/plain")
print()
print("webserver test")

uncommenting either "from selenium import webdriver" or "import selenium" will result in HTTP 500 Internal Server Error. No error on this: $ python3 -c "import selenium;from selenium import webdriver;print('test bash')"

The solution now is to do the following in GNU/Linux. This is far from perfect:

  1. Run $ crontab -e and add the line * * * * * /path/to/run.sh
  2. File run.sh is set to executable (run $ chmod +x run.sh)
  3. Contents of "run.sh":
#!/usr/bin/env bash
export DISPLAY=:0
if [[ $(cat /path/to/run1) == "Yes do it" ]]; then
    python3 -c "from selenium import webdriver;options=webdriver.ChromeOptions();options.binary_location=\"/usr/bin/brave-browser\";driver=webdriver.Chrome(options=options);driver.get(\"$(cat /path/to/run2)\");"
fi
  1. Replace "/path/to/run1" and "/path/to/run2" with actual paths to empty text files somewhere that you have. They should have 777 permissions, or similar ($ chmod 777 run1).
  2. Create these two files in "/usr/lib/cgi-bin/": urlon.sh and urloff.sh
  3. Contents of "urlon.sh" (set to executable):
#!/bin/bash
echo "Content-type: text/plain"
echo
url="$(echo -n "$REQUEST_URI" | sed "s/.*?url=//g")"
echo "Yes do it" > /path/to/run1
echo "$url" > /path/to/run2
echo "URL: $url"
  1. Contents of "urloff.sh" (set to executable):
#!/bin/bash
echo "Content-type: text/plain"
echo
echo > /path/to/run1
echo "Disabled"
  1. Usage: $ curl -kL https://10.0.0.199/cgi-bin/urlon.sh?url=https://example.com = "URL: https://example.com" and $ curl -kL https://10.0.0.199/cgi-bin/urloff.sh = "Disabled". Remember to disable it so it doesn't keep going at every minute. Also, not sure if this will work if XSreenSaver / login screen lock comes into action.

Upvotes: 0

nidabdella
nidabdella

Reputation: 821

Well, I found the solution for my problem! for one : I didn't pay attention that I wrote vy instead of by in my functions : div.find_element_by_tag_name. And the second thing was using an Apache server. For some reason the lite python server using CGIHTTPServer doesn't work. So I used XAMPP modified the httpd.conf file and the last thing was adding the path #!/Python27/python to the script.

Upvotes: 0

Related Questions