asheets
asheets

Reputation: 870

Web scraping with python and selenium: No connection could be made because the target machine actively refused it

I know this error has been discussed quite a bit, but it seems that there is a different cause in each case. I am using the following code and selenium to extract some data from a website and get the error mentioned above during the second call of browser.get(url).

import openpyxl, os
from selenium import webdriver

os.chdir('C://Users/user/Documents')
os.makedirs('GenBank Data', exist_ok = True)

book = openpyxl.load_workbook('Squirrel list 50 percent genus.xlsx')
sheet = book.active

dirs = 'C://Users/user/Documents/GenBank Data'
os.chdir(dirs)

browser = webdriver.Chrome(executable_path = 'C://Users/user/chromedriver.exe',
                           service_args = ['--ignore-ssl-errors=true', '--ssl-protocol=TLSv1'])

start_col = 7
end_col = 9
start_row = 2 
end_row = 160

url_root = 'https://www.ncbi.nlm.nih.gov/nuccore/'
url_end = '.1?report=fasta'

for y in range(start_col, end_col + 1):
    file = open(sheet.cell(row = 1, column = y).value, 'w')
    for x in range(start_row, end_row + 1):
        accession = sheet.cell(row = x, column = y).value
        if accession:
            print(accession)           
            url = url_root + accession + url_end
            browser.get(url)
            data = browser.find_element_by_tag_name('pre')
            file.write(data.text + '\n' + '\n')

            browser.quit()
    file.close()

I'm using my own machine and have limited knowledge of servers and ports, which seem to be the focus of answers to similar questions. Any help would be appreciated. I've copied the traceback below.

Traceback (most recent call last):

  File "<ipython-input-1-b8f523f5e981>", line 1, in <module>
    runfile('C:/Users/Alec/test.py', wdir='C:/Users/Alec')

  File "C:\Users\Alec\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
    execfile(filename, namespace)

  File "C:\Users\Alec\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/Alec/test.py", line 38, in <module>
    browser.get(url)

  File "C:\Users\Alec\selenium\webdriver\remote\webdriver.py", line 309, in get
    self.execute(Command.GET, {'url': url})

  File "C:\Users\Alec\selenium\webdriver\remote\webdriver.py", line 295, in execute
    response = self.command_executor.execute(driver_command, params)

  File "C:\Users\Alec\selenium\webdriver\remote\remote_connection.py", line 464, in execute
    return self._request(command_info[0], url, body=data)

  File "C:\Users\Alec\selenium\webdriver\remote\remote_connection.py", line 487, in _request
    self._conn.request(method, parsed_url.path, body, headers)

  File "C:\Users\Alec\Anaconda3\lib\http\client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)

  File "C:\Users\Alec\Anaconda3\lib\http\client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)

  File "C:\Users\Alec\Anaconda3\lib\http\client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)

  File "C:\Users\Alec\Anaconda3\lib\http\client.py", line 1026, in _send_output
    self.send(msg)

  File "C:\Users\Alec\Anaconda3\lib\http\client.py", line 964, in send
    self.connect()

  File "C:\Users\Alec\Anaconda3\lib\http\client.py", line 936, in connect
    (self.host,self.port), self.timeout, self.source_address)

  File "C:\Users\Alec\Anaconda3\lib\socket.py", line 722, in create_connection
    raise err

  File "C:\Users\Alec\Anaconda3\lib\socket.py", line 713, in create_connection
    sock.connect(sa)

ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it

Upvotes: 0

Views: 1990

Answers (1)

Tarun Lalwani
Tarun Lalwani

Reputation: 146630

Even the website I try in the above code works well through the first iteration

That helped me spot the issue in your code.

    accession = sheet.cell(row = x, column = y).value
    if accession:
        print(accession)           
        url = url_root + accession + url_end
        browser.get(url)
        data = browser.find_element_by_tag_name('pre')
        file.write(data.text + '\n' + '\n')

        browser.quit()

In your if statement you quit the browser and then loop again and try to get the URL using the same browser, which is no longer there. That's why a socket connection error occurs.

The solution is to move browser.quit() to the end of the code, outside the for loop.

Upvotes: 2

Related Questions