reike
reike

Reputation: 129

Generating a list of URLs with Selenium Python

I'm trying to generate a list of URLs with Selenium. I would like the user to navigate through the instrumented browser and finally create a list of URL that he visited.

I found that the property "current_url" could help to do that but I didn't find a way to know that the user clicked on a link.

In [117]: from selenium import webdriver

In [118]: browser = webdriver.Chrome()

In [119]: browser.get("http://stackoverflow.com")

--> here, I click on the "Questions" link.

In [120]: browser.current_url

Out[120]: 'http://stackoverflow.com/questions'

--> here, I click on the "Jobs" link.

In [121]: browser.current_url

Out[121]: 'http://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab'

Any hint appreciated !

Thank you,

Upvotes: 2

Views: 947

Answers (1)

crookedleaf
crookedleaf

Reputation: 2198

There isn't really an official way to monitor what a user is doing in Selenium. The only thing you can really do is start the driver, then run a loop that is constantly checking the driver.current_url. However, I don't know what the best way to exit this loop is since i don't know what your usage is. Maybe try something like:

from selenium import webdriver


urls = []

driver = webdriver.Firefox()

current = 'http://www.google.com'
driver.get('http://www.google.com')
while True:
    if driver.current_url != current:
        current = driver.current_url

        # if you want to capture every URL, including duplicates:
        urls.append(current)

        # or if you only want to capture unique URLs:
        if current not in urls:
            urls.append(current)

If you don't have any idea on how to end this loop, i'd suggest either the user navigating to a url that will break the loop, such as http://www.endseleniumcheck.com and add it into the code as such:

from selenium import webdriver


urls = []

driver = webdriver.Firefox()

current = 'http://www.google.com'
driver.get('http://www.google.com')
while True:
    if driver.current_url == 'http://www.endseleniumcheck.com':
        break

    if driver.current_url != current:
        current = driver.current_url

        # if you want to capture every URL, including duplicates:
        urls.append(current)

        # or if you only want to capture unique URLs:
        if current not in urls:
            urls.append(current)

Or, if you want to get crafty, you can terminate the loop when the user exit's the browser. You can do this by monitoring the Process ID with the psutil library (pip install psutil):

from selenium import webdriver
import psutil


urls = []

driver = webdriver.Firefox()
pid = driver.binary.process.pid

current = 'http://www.google.com'
driver.get('http://www.google.com')
while True:
    if pid not in psutil.pids():
        break

    if driver.current_url != current:
        current = driver.current_url

        # if you want to capture every URL, including duplicates:
        urls.append(current)

        # or if you only want to capture unique URLs:
        if current not in urls:
            urls.append(current)

Upvotes: 2

Related Questions