Reputation: 688
i'm trying to write a web spider to gather me some links and text. I have a table i'm working with and the second cell of each row has a number in it, all i want to do is get that number, if it's the one i need then grab the links and text in cell 2&4.
Everything works fine except that i can't seem to be able to compare the numbers from the cell to a list of numbers i have.
I get the number using cells[1].get_text() (i create a list of all the cells for each row), this works fine and the type() returns 'class 'str'', i also make sure to convert my numbers list to string.
But when i try to compare them it always returns 'False'
import bs4
file = open(r"some html file", 'rb')
rng_lst = [str(x) for x in range(5, 43)]
soup = bs4.BeautifulSoup(file)
table = soup.findAll('table')[0]
for row in table.findAll('tr'):
cells = row.findAll('td')
if len(cells) >= 6:
check = cells[1].get_text()
for n in rng_lst:
if n == check:
# do stuff
I've tried everything i can think of and i ALWAYS get 'False', using == or 'is' doesn't work, if i try using 'in' it does work but then if i need cell number 5 i can get 15 or 25 also.
Upvotes: 0
Views: 1002
Reputation: 473863
Most likely, you just need to strip the text you are getting from a cell:
check = cells[1].get_text(strip=True)
It is still a guess, but an "educated" one.
Upvotes: 2