Reputation: 31
Goal: I want to run a Selenium Python script through BrowserMob-Proxy, which will capture and output a HAR file capture.
Problem: I have a functional (very basic) Python script (shown below). When it is altered to utilize BrowserMob-Proxy to capture HAR however, it fails. Below I provide two different scripts that both fail, but for differing reasons (details provided after code snippets).
BrowserMob-Proxy Explanation: As mentioned before, I am using both 0.6.0 AND 2.0-beta-8. The reasoning for this is that A) LightBody (lead designer of BMP) recently indicated that his most current release (2.0-beta-9) is not functional and advises users to use 2.0-beta-8 instead and B) from what I can tell from reading various site/stackoverflow information is that 0.6.0 (acquired through PIP) is used to make calls to the Client.py/Server.py, whereas 2.0-beta-8 is used to initiate the Server. To be honest, this confuses me. When importing BMP's Server however, it requires a batch (.bat) file to initiate the server, which is not provided in 0.6.0, but is with 2.0-beta-8...if anyone can shed some light on this area of confusion (I suspect it is the root of my problems described below), then I'd be most appreciative.
Software Specs:
Selenium Script (this script works):
"""This script utilizes Selenium to obtain the Google homepage"""
from selenium import webdriver
driver = webdriver.Firefox() # Opens FireFox browser.
driver.get('https://google.com/') # Gets google.com and loads page in browser.
driver.quit() # Closes Firefox browser
This script succeeds in running and does not produce any errors. It is provided for illustrative purposes to indicate it works before adding BMP logic.
Script ALPHA with BMP (does not work):
"""Using the same functional Selenium script, produce ALPHA_HAR.har output"""
from browsermobproxy import Server
server = Server('C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy')
server.start()
proxy = server.create_proxy()
from selenium import webdriver
driver = webdriver.Firefox() # Opens FireFox browser.
proxy.new_har("ALPHA_HAR") # Creates a new HAR
driver.get("https://www.google.com/") # Gets google.com and loads page in browser.
proxy.har # Returns a HAR JSON blob
server.stop()
This code will succeed in running the script and will not produce any errors. However, when searching the entirety of my hard drive, I never succeed in locating ALPHA_HAR.har.
Script BETA with BMP (does not work):
"""Using the same functional Selenium script, produce BETA_HAR.har output"""
from browsermobproxy import Server
server = Server("C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy")
server.start()
proxy = server.create_proxy()
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
proxy.new_har("BETA_HAR") # Creates a new HAR
driver.get("https://www.google.com/") # Gets google.com and loads page in browser.
proxy.har # Returns a HAR JSON blob
server.stop()
This code was taken from http://browsermob-proxy-py.readthedocs.org/en/latest/. When running the above code, FireFox will attempt to get google.com, but will never succeed in loading the page. Eventually it will time out without producing any errors. And BETA_HAR.har can't be found anywhere on my hard drive. I have also noticed that, when trying to use this browser to visit any other site, it will similarly fail to load (I suspect this is due to the proxy not being configured properly).
Upvotes: 3
Views: 12498
Reputation: 11
What worked for me was to downgrade java version to java11. I used jenv to install and manage multiple java versions.
Upvotes: 1
Reputation: 781
Inherently, the HAR object generated by the proxy is just that: an object in memory. The reason you can't find it on your hard drive is because it's not being saved there unless you write it there yourself. This is a pretty simple operation, as the HAR is just JSON.
with open("harfile", "w") as harfile:
harfile.write(json.dumps(proxy.har))
When you start dumping your HAR file, you'll find that your HAR file is empty with the ALPHA script. This is because you are not adding the proxy to the settings for Firefox, meaning that it will just connect directly bypassing your proxy.
This code is written correctly as far as connecting to the proxy, although personally I prefer adding the proxy to the capabilities and passing those through. The code for that is:
cap = webdriver.DesiredCapabilities.FIREFOX.copy()
proxy.add_to_capabilities(cap)
driver = webdriver.Firefox(capabilities=cap)
I would guess that your issue lies with the proxy itself. Check the bmp.log and/or server.log files in the location of the python script and see what it is saying if something is going wrong.
Another alternative is that selenium is reporting back that the webpage has loaded before it actually has finished getting all of the elements, and as such your proxy is shutting down too early. Try making the script wait a bit longer before shutting down the proxy, or running it interactively through the interpreter.
Upvotes: 0
Reputation: 685
Try this:
from browsermobproxy import Server
from selenium import webdriver
import json
server = Server("path/to/browsermob-proxy")
server.start()
proxy = server.create_proxy()
profile = webdriver.FirefoxProfile()
profile.set_proxy(self.proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
proxy.new_har("http://stackoverflow.com", options={'captureHeaders': True})
driver.get("http://stackoverflow.com")
result = json.dumps(proxy.har, ensure_ascii=False)
print result
proxy.stop()
driver.quit()
Upvotes: 3
Reputation: 4066
When you do:
proxy.har
You need to parse that response, proxy.har is a JSON object, so if you need to generate a file, you need to do this:
myFile = open('BETA_HAR.har','w')
myFile.write( str(proxy.har) )
myFile.close()
Then you will find your .har
Upvotes: 0
Reputation: 71
I use phantomJS, here is an example of how to use it with python:
import browsermobproxy as mob
import json
from selenium import webdriver
BROWSERMOB_PROXY_PATH = '/usr/share/browsermob/bin/browsermob-proxy'
url = 'http://google.com'
s = mob.Server(BROWSERMOB_PROXY_PATH)
s.start()
proxy = s.create_proxy()
proxy_address = "--proxy=127.0.0.1:%s" % proxy.port
service_args = [ proxy_address, '--ignore-ssl-errors=yes', ] #so that i can do https connections
driver = webdriver.PhantomJS(service_args=service_args)
driver.set_window_size(1400, 1050)
proxy.new_har(url)
driver.get(url)
har_data = json.dumps(proxy.har, indent=4)
screenshot = driver.get_screenshot_as_png()
imgname = "google.png"
harname = "google.har"
save_img = open(imgname, 'a')
save_img.write(screenshot)
save_img.close()
save_har = open(harname, 'a')
save_har.write(har_data)
save_har.close()
driver.quit()
s.stop()
Upvotes: 2