fabiobh
fabiobh

Reputation: 779

Is it possible to reduce memory RAM consumption when using Selenium GeckoDriver and Firefox

I use Selenium and Firefox webdriver with python to scrape data from a website.

But in the code, I need to access this website more than 10k times and it consumes a lot of RAM to do that.

Usually, when the script access this site 2500 times, it already consumes 4gb or more of RAM and it stops to work.

Is it possible to reduce memory RAM consumption without close browser session?

I ask that because when I start the script, I need to log manually on the site(two-factor autentication, the code is not shown below) and if I close the browser session, I will need to log in the site again.

for itemLista in lista:
    driver.get("https://mytest.site.com/query/option?opt="+str(itemLista))

    isActivated = driver.find_element_by_xpath('//div/table//tr[2]//td[1]')
    activationDate = driver.find_element_by_xpath('//div/table//tr[2]//td[2]')

    print(str(isActivated.text))
    print(str(activationDate.text))

    indice+=1
    print("numero: "+str(indice))

    file2.write(itemLista+" "+str(isActivated.text)+" "+str(activationDate.text)+"\n")

#close file
file2.close()

Upvotes: 4

Views: 22722

Answers (3)

fabiobh
fabiobh

Reputation: 779

I discover how to avoid the memory leak.

I just use

time.sleep(2)

after

file2.write(itemLista+" "+str(isActivated.text)+" "+str(activationDate.text)+"\n")

Now firefox is working without consumes lots of RAM

It is just perfect.

I don't know exactly why it stopped consumes so much memory, but I think it was growing memory consume because it didn't have time to finish each driver.get request.

Upvotes: 2

undetected Selenium
undetected Selenium

Reputation: 193268

It is not clear from your question about the list items within lista to check the actual url/website.

However, it may not be possible to reduce RAM consumption while accessing the website more than 10k times in a row with the approach you have adapted.

Solution

As you mentioned when the script access this site 2500 times or so, it already consumes 4gb or more of RAM and it stops to work you may induce a counter to access the site 2000 times in a loop and reinitialize the WebDriver and Web Browser afresh after invoking driver.quit() within tearDown(){} method to close & destroy the existing WebDriver and Web Client instances gracefully as follows:

driver.quit() // Python

You can find a detailed discussion in PhantomJS web driver stays in memory

Incase the GeckoDriver and Firefox processes are still not destroyed and removed you may require to kill the processes from tasklist.

  • Python Solution(Cross Platform):

    import os
    import psutil
    
    PROCNAME = "geckodriver" # or chromedriver or iedriverserver
    for proc in psutil.process_iter():
        # check whether the process name matches
        if proc.name() == PROCNAME:
            proc.kill()
    

You can find a detailed discussion in Selenium : How to stop geckodriver process impacting PC memory, without calling driver.quit()?

Upvotes: 1

r.ook
r.ook

Reputation: 13898

As mentioned in my comment, only open and write to your file on each iteration instead of keeping it open in memory:

# remove the line file2 = open(...) from your code

for itemLista in lista:
    driver.get("https://mytest.site.com/query/option?opt="+str(itemLista))

    isActivated = driver.find_element_by_xpath('//div/table//tr[2]//td[1]')
    activationDate = driver.find_element_by_xpath('//div/table//tr[2]//td[2]')

    print(str(isActivated.text))
    print(str(activationDate.text))

    indice+=1
    print("numero: "+str(indice))

    with open("your file path here", "w") as file2:
        file2.write(itemLista+" "+str(isActivated.text)+" "+str(activationDate.text)+"\n")

While selenium is quite a memory hungry beast, it doesn't necessarily murder your RAM with each growing iteration. However your growing opened buffer of file2 does take up RAM the more you write to it. Only when it's closed it will release the virtual memory and write the physical.

Upvotes: 1

Related Questions