Reputation: 3
The script reads a single URL from a text file and then imports information from that web page and store it in a CSV file. The script works fine for a single URL. Problem: I have added several URLs in my text file line by line and now I want my script to read first URL, do the desired operation and then go back to text file to read the second URL and repeat. Once I added the for loop to get this done, I stated facing the below error:
Traceback (most recent call last): File "C:\Users\T947610\Desktop\hahah.py", line 22, in table = soup.findAll("table", {"class":"display"})[0] #Facing error in this statement IndexError: list index out of range
f = open("URL.txt", 'r')
for line in f.readlines():
print (line)
page = requests.get(line)
print(page.status_code)
print(page.content)
soup = BeautifulSoup(page.text, 'html.parser')
print("soup command worked")
table = soup.findAll("table", {"class":"display"})[0] #Facing error in this statement
rows = table.findAll("tr")
Upvotes: 0
Views: 1099
Reputation: 7224
Sometimes findAll
throws an exception if it can't find the data in the findall
. I have this same issue and I work around it with try/except, except you'll need to deal with empty values probably differently than I've show, which is for example:
f = open("URL.txt", 'r')
for line in f.readlines():
print (line)
page = requests.get(line)
print(page.status_code)
print(page.content)
soup = BeautifulSoup(page.text, 'html.parser')
print("soup command worked")
try:
table = soup.findAll("table", {"class":"display"})[0] #Facing error in this statement
rows = table.findAll("tr")
except IndexError:
table = None
rows = None
Upvotes: 1
Reputation: 697
If the single url input was working, maybe new input line from .txt is the problem. Try apply .strip() to the line, the line normally has whitespace at the head and tail
page = requests.get(line.strip())
Also, if soup.findall() find nothing, it will return None, which cannot be indexed. Try print the soup and check the content.
Upvotes: 0