Reputation: 63
For our privacy policy we would like to write a crawler that automatically lists all 3rd party connections as well as all cookies. These should run daily and be compared with the existing ones.
For the implementation I used python3 and Selenium with Firefox.
get_cookies()
returns only cookies that are intended for the current domain. -> Does not work correctly.firefox_profile = webdriver.FirefoxProfile("/home/user/.mozilla/firefox/rkggssrl.SeleniumTest")
browser = webdriver.Firefox(firefox_profile, executable_path=r'./geckodriver')
browser.get('https://www.lenovo.com/') # or any other side
time.sleep(20)
I've already tried Selenium test runs won't save cookies?.
Questions:
Upvotes: 1
Views: 1653
Reputation: 63
The solution to question 3 is relatively simple, which automatically answers all other questions as well. I'll leave it as it is. Selenoum creates a new profile at every start. After that I can read the cookies.sqlite file and get all cookies.
How I figured it out:
dir()
function (https://docs.python.org/3/library/functions.html#dir) I got the following.print(dir(browser))
['CONTEXT_CHROME', 'CONTEXT_CONTENT', 'NATIVE_EVENTS_ALLOWED', '__class__', '__delattr__',
'__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__',
'__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__',
'__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__',
'_file_detector', '_is_remote', '_mobile', '_switch_to', '_unwrap_value',
'_web_element_cls', '_wrap_value', 'add_cookie', 'application_cache', 'back', 'binary',
'capabilities', 'close', 'command_executor', 'context', 'create_web_element',
'current_url', 'current_window_handle', 'delete_all_cookies', 'delete_cookie',
'desired_capabilities', 'error_handler', 'execute', 'execute_async_script',
'execute_script', 'file_detector', 'file_detector_context', 'find_element',
'find_element_by_class_name', 'find_element_by_css_selector', 'find_element_by_id',
'find_element_by_link_text', 'find_element_by_name', 'find_element_by_partial_link_text',
'find_element_by_tag_name', 'find_element_by_xpath', 'find_elements',
'find_elements_by_class_name', 'find_elements_by_css_selector', 'find_elements_by_id',
'find_elements_by_link_text', 'find_elements_by_name',
'find_elements_by_partial_link_text', 'find_elements_by_tag_name',
'find_elements_by_xpath', 'firefox_profile', 'forward', 'fullscreen_window', 'get',
'get_cookie', 'get_cookies', 'get_log', 'get_screenshot_as_base64',
'get_screenshot_as_file', 'get_screenshot_as_png', 'get_window_position',
'get_window_rect', 'get_window_size', 'implicitly_wait', 'install_addon', 'log_types',
'maximize_window', 'minimize_window', 'mobile', 'name', 'orientation', 'page_source',
'profile', 'quit', 'refresh', 'save_screenshot', 'service', 'session_id', 'set_context',
'set_page_load_timeout', 'set_script_timeout', 'set_window_position', 'set_window_rect',
'set_window_size', 'start_client', 'start_session', 'stop_client', 'switch_to',
'switch_to_active_element', 'switch_to_alert', 'switch_to_default_content',
'switch_to_frame', 'switch_to_window', 'title', 'uninstall_addon', 'w3c',
'window_handles']
print(browser.capabilities)
{'acceptInsecureCerts': True, 'browserName': 'firefox', 'browserVersion': '76.0.1',
'moz:accessibilityChecks': False, 'moz:buildID': '20200507114007',
'moz:geckodriverVersion': '0.26.0', 'moz:headless': False, 'moz:processID': 110409,
'moz:profile': '/tmp/rust_mozprofilel3IJsK', 'moz:shutdownTimeout': 60000,
'moz:useNonSpecCompliantPointerOrigin': False, 'moz:webdriverClick': True,
'pageLoadStrategy': 'normal', 'platformName': 'linux',
'platformVersion': '5.3.0-1020-azure', 'rotatable': False, 'setWindowRect': True,
'strictFileInteractability': False, 'timeouts': {'implicit': 0, 'pageLoad': 300000,
'script': 30000}, 'unhandledPromptBehavior': 'dismiss and notify'}
Now the temporary profile can be determined with the element 'moz:profile'. Before closing Selenium the following is now executed:
# Save cookie file
outputDest = "yourOutputDestination"
capability = browser.capabilities
browserDir = capability['moz:profile']
cookieFile = os.path.join(browserDir, 'cookies.sqlite')
os.rename(cookieFile, outputDest)
# Close selenium
if browser is not None:
browser.close()
browser.quit()
browser = None
Upvotes: 3