MQ1217
MQ1217

Reputation: 37

Python BeautifulSoup error element invisible when trying to find href?

I am trying to find a URL containing '.ics' in an href. I tested this code the other day and it was working perfectly, but now when I try to search 'for link in links', 'print link' results in: ``

<a class="element-invisible element-focusable" href="#main-content" 
tabindex="1">Skip to main content</a>
<a class="element-invisible element-focusable" href="#main-content">Skip to 
main content</a>

Becuase of this, the 'if link.get('href')' code is never satisfied and the URL is not returned. What is causing this, and is there another way to return the URL containing '.ics'?

page = requests.get('https://registrar.fas.harvard.edu/calendar').content
soup = bs4.BeautifulSoup(page, 'lxml')

links = soup.find_all('a')
#print links    
for link in links:
    print link    

    if link.get('href') != None and '.ics' in link.get('href'):
        endout = link.get('href')

        if endout[:6] == 'webcal':
            endout ='https' + endout[6:]
        print
        print 'URL: ' + endout
        print
        return endout
    break

Upvotes: 0

Views: 83

Answers (1)

cs95
cs95

Reputation: 402263

I would recommend streamlining your search by passing a css href selector and regex pattern:

links = soup.find_all('a', {'href' : re.compile('.*\.ics') })

Output:

[<a class="subscribe" href="https://registrar.fas.harvard.edu/calendar/upcoming/all/export.ics">subscribe</a>,
 <a class="ical" href="https://registrar.fas.harvard.edu/calendar/upcoming/all/export.ics">iCal</a>]

You won't have to jump through hoops to validate your anchor tags now.

Upvotes: 3

Related Questions