Kunjal Pundeer
Kunjal Pundeer

Reputation: 3

Web scraping python: IndexError: list index out of range

The script reads a single URL from a text file and then imports information from that web page and store it in a CSV file. The script works fine for a single URL. Problem: I have added several URLs in my text file line by line and now I want my script to read first URL, do the desired operation and then go back to text file to read the second URL and repeat. Once I added the for loop to get this done, I stated facing the below error:

Traceback (most recent call last): File "C:\Users\T947610\Desktop\hahah.py", line 22, in table = soup.findAll("table", {"class":"display"})[0] #Facing error in this statement IndexError: list index out of range

f = open("URL.txt", 'r')
for line in f.readlines():
    print (line)
    page = requests.get(line)
    print(page.status_code)
    print(page.content)
    soup = BeautifulSoup(page.text, 'html.parser')
    print("soup command worked")
    table = soup.findAll("table", {"class":"display"})[0] #Facing error in this statement
    rows = table.findAll("tr")

Upvotes: 0

Views: 1099

Answers (2)

oppressionslayer
oppressionslayer

Reputation: 7224

Sometimes findAll throws an exception if it can't find the data in the findall. I have this same issue and I work around it with try/except, except you'll need to deal with empty values probably differently than I've show, which is for example:

f = open("URL.txt", 'r')
for line in f.readlines():
    print (line)
    page = requests.get(line)
    print(page.status_code)
    print(page.content)
    soup = BeautifulSoup(page.text, 'html.parser')
    print("soup command worked")
    try:
      table = soup.findAll("table", {"class":"display"})[0] #Facing error in this statement
      rows = table.findAll("tr")
    except IndexError:
       table = None
       rows = None

Upvotes: 1

Sheng Zhuang
Sheng Zhuang

Reputation: 697

If the single url input was working, maybe new input line from .txt is the problem. Try apply .strip() to the line, the line normally has whitespace at the head and tail

page = requests.get(line.strip())

Also, if soup.findall() find nothing, it will return None, which cannot be indexed. Try print the soup and check the content.

Upvotes: 0

Related Questions