Reputation: 821
I created a python script that uses Selenium webdriver to scrap a website. Now I'm trying to run this script from the web using CGI. So to ensure that my CGI server is working I tried this :
import cgi
print 'Content-Type: text/html'
print
list_brand = ['VOLVO','FIAT', 'BMW']
print '<h1>TESTING CGI</h1>'
print '<form>'
print '<select>'
for i in range(3):
print '<option value="' + list_brand[i] + '">'+ list_brand[i] +'</option>'
print '</select>'
print '</form>'
And it worked fine. Now, When I use Selenium with CGI using this script:
import cgitb
import cgi
from selenium import webdriver
print 'Content-Type: text/html'
print
cgitb.enable(display=0, logdir="C:/path/to/log/directory")
path_to_pjs = 'C:path/to/phantomjs-2.1.1-windows/bin/phantomjs.exe'
browser = webdriver.PhantomJS(executable_path = path_to_pjs)
#Reaching to URL
url = 'http://www.website.fr/cl/2/products'
browser.get(url)
div_set = browser.find_elements_by_class_name('productname')
print '<form>'
print '<select>'
for div in div_set:
print '<option value="' + div.find_element_vy_tag_name('h3').text + '">'+ div.find_element_vy_tag_name('h3').text +'</option>'
print '</select>'
print '</form>'
the page keeps loading but doesn't respond. Any idea if this is even possible (I mean running selenium from a cgi script) or why my server doesn't respond ?
Upvotes: 1
Views: 634
Reputation: 519
That may have worked in 2017, but in 2024, Apache HTTP Server doesn't let CGI/www-data import selenium. With this CGI script
#!/usr/bin/env python3
import cgi
#from selenium import webdriver
#import selenium
print("Content-type: text/plain")
print()
print("webserver test")
uncommenting either "from selenium import webdriver" or "import selenium" will result in HTTP 500 Internal Server Error. No error on this: $ python3 -c "import selenium;from selenium import webdriver;print('test bash')"
The solution now is to do the following in GNU/Linux. This is far from perfect:
$ crontab -e
and add the line * * * * * /path/to/run.sh
$ chmod +x run.sh
)#!/usr/bin/env bash
export DISPLAY=:0
if [[ $(cat /path/to/run1) == "Yes do it" ]]; then
python3 -c "from selenium import webdriver;options=webdriver.ChromeOptions();options.binary_location=\"/usr/bin/brave-browser\";driver=webdriver.Chrome(options=options);driver.get(\"$(cat /path/to/run2)\");"
fi
$ chmod 777 run1
).#!/bin/bash
echo "Content-type: text/plain"
echo
url="$(echo -n "$REQUEST_URI" | sed "s/.*?url=//g")"
echo "Yes do it" > /path/to/run1
echo "$url" > /path/to/run2
echo "URL: $url"
#!/bin/bash
echo "Content-type: text/plain"
echo
echo > /path/to/run1
echo "Disabled"
$ curl -kL https://10.0.0.199/cgi-bin/urlon.sh?url=https://example.com
= "URL: https://example.com" and $ curl -kL https://10.0.0.199/cgi-bin/urloff.sh
= "Disabled". Remember to disable it so it doesn't keep going at every minute. Also, not sure if this will work if XSreenSaver / login screen lock comes into action.Upvotes: 0
Reputation: 821
Well, I found the solution for my problem! for one : I didn't pay attention that I wrote vy
instead of by
in my functions : div.find_element_by_tag_name
.
And the second thing was using an Apache server. For some reason the lite python server using CGIHTTPServer doesn't work. So I used XAMPP modified the httpd.conf
file and the last thing was adding the path #!/Python27/python
to the script.
Upvotes: 0