DanielSon
DanielSon

Reputation: 1545

How am I getting two different results from same Python print command?

For the first print tag I am getting a large list of hundreds of <a tags. For the second print tag I am getting a list with four <a tags, not including the ones that I want.

One of the tags that tags that I want is at the end of tags. After printing all several hundred tags, I am printing the last tag, and that is printing the correct end tag as it should. But then by running another for loop over the same (unchanged) list tags I am not just getting a different result, but significantly different.

With or without the `print '\n\n\n' the phenomenon is happening, it's just to make the split between the two prints easier for me to see.

What is happening to this list in between the first and second for loop to cause this problem?

(This code is exactly as I have it in my script. Originally I didn't have the lines from the first for loop until the empty line, and am doing this to debug the lack of the correct URL from the end result.)

EDIT: Also, here is what is being printed for all the print statements (only the last section of the first print within the for loop):

import urllib
from bs4 import BeautifulSoup

startingList = ['http://www.stowefamilylaw.co.uk/']
for url in startingList:
    try:
        html = urllib.urlopen(url)
        soup = BeautifulSoup(html,'lxml')
        tags = soup('a')
        for tag in tags:
            print tag
        print tags[-1]
        print '\n\n\n'

        for tag in tags:
            print tag
            if not tag.get('href', None).startswith('..'):
                continue
    except:
        continue

....

<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/faq-category/decrees-orders-forms/" itemprop="url">Decrees, Orders &amp; Forms</a>
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/faq-category/international-divorce/" itemprop="url">International Divorce</a>
<a class="shiftnav-target"><i class="fa fa-chevron-left"></i> Back</a>
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/contact/" itemprop="url"><i class="fa fa-phone"></i> Contact</a>
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/contact/" itemprop="url"><i class="fa fa-phone"></i> Contact</a>




<a href="http://www.stowefamilylaw.co.uk/">Stowe Family Law</a>
<a href="#spu-5086" style="color: #fff"><div class="callbackbutton"><i class="fa fa-phone" style="font-size: 16px"></i> Request Callback </div></a>
<a href="#spu-5084" style="color: #fff"><div class="callbackbutton"><i class="fa fa-envelope-o" style="font-size: 16px"></i> Quick Enquiry </div></a>
<a class="ubermenu-responsive-toggle ubermenu-responsive-toggle-main ubermenu-skin-black-white-2 ubermenu-loc-primary" data-ubermenu-target="ubermenu-main-3-primary"><i class="fa fa-bars"></i>Main Menu</a>

Upvotes: 0

Views: 338

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122082

You have a blanket except::

try:
    # ...
except:
    continue

so any error in the block will be masked and your loop will be skipped. Don't use blanket except handlers without raising again, ever, see Why is "except: pass" a bad programming practice?. At the very least catch only Exception and print that error:

except Exception as e:
    print 'Encountered:', e

Without proper diagnostics all we can do is guess.

One error you definitely have is an attribute error here when there is no href attribute; the None object doesn't have an attribute startswith:

if not tag.get('href', None).startswith('..'):

Instead of None return an empty string:

if not tag.get('href', '').startswith('..'):

or better yet, select only a tags with an href attribute:

tags = soup.select('a[href]')

Upvotes: 3

Related Questions