Reputation: 779
I use Selenium and Firefox webdriver with python to scrape data from a website.
But in the code, I need to access this website more than 10k times and it consumes a lot of RAM to do that.
Usually, when the script access this site 2500 times, it already consumes 4gb or more of RAM and it stops to work.
Is it possible to reduce memory RAM consumption without close browser session?
I ask that because when I start the script, I need to log manually on the site(two-factor autentication, the code is not shown below) and if I close the browser session, I will need to log in the site again.
for itemLista in lista:
driver.get("https://mytest.site.com/query/option?opt="+str(itemLista))
isActivated = driver.find_element_by_xpath('//div/table//tr[2]//td[1]')
activationDate = driver.find_element_by_xpath('//div/table//tr[2]//td[2]')
print(str(isActivated.text))
print(str(activationDate.text))
indice+=1
print("numero: "+str(indice))
file2.write(itemLista+" "+str(isActivated.text)+" "+str(activationDate.text)+"\n")
#close file
file2.close()
Upvotes: 4
Views: 22722
Reputation: 779
I discover how to avoid the memory leak.
I just use
time.sleep(2)
after
file2.write(itemLista+" "+str(isActivated.text)+" "+str(activationDate.text)+"\n")
Now firefox is working without consumes lots of RAM
It is just perfect.
I don't know exactly why it stopped consumes so much memory, but I think it was growing memory consume because it didn't have time to finish each driver.get request.
Upvotes: 2
Reputation: 193268
It is not clear from your question about the list items within lista to check the actual url/website.
However, it may not be possible to reduce RAM consumption while accessing the website more than 10k times in a row with the approach you have adapted.
As you mentioned when the script access this site 2500 times or so, it already consumes 4gb or more of RAM and it stops to work you may induce a counter to access the site 2000 times in a loop and reinitialize the WebDriver and Web Browser afresh after invoking driver.quit()
within tearDown(){}
method to close & destroy the existing WebDriver and Web Client instances gracefully as follows:
driver.quit() // Python
You can find a detailed discussion in PhantomJS web driver stays in memory
Incase the GeckoDriver and Firefox processes are still not destroyed and removed you may require to kill the processes from tasklist.
Python Solution(Cross Platform):
import os
import psutil
PROCNAME = "geckodriver" # or chromedriver or iedriverserver
for proc in psutil.process_iter():
# check whether the process name matches
if proc.name() == PROCNAME:
proc.kill()
You can find a detailed discussion in Selenium : How to stop geckodriver process impacting PC memory, without calling driver.quit()?
Upvotes: 1
Reputation: 13898
As mentioned in my comment, only open and write to your file on each iteration instead of keeping it open in memory:
# remove the line file2 = open(...) from your code
for itemLista in lista:
driver.get("https://mytest.site.com/query/option?opt="+str(itemLista))
isActivated = driver.find_element_by_xpath('//div/table//tr[2]//td[1]')
activationDate = driver.find_element_by_xpath('//div/table//tr[2]//td[2]')
print(str(isActivated.text))
print(str(activationDate.text))
indice+=1
print("numero: "+str(indice))
with open("your file path here", "w") as file2:
file2.write(itemLista+" "+str(isActivated.text)+" "+str(activationDate.text)+"\n")
While selenium
is quite a memory hungry beast, it doesn't necessarily murder your RAM with each growing iteration. However your growing opened buffer of file2
does take up RAM the more you write to it. Only when it's closed it will release the virtual memory and write the physical.
Upvotes: 1