Python BeautifulSoup error element invisible when trying to find href?

Question

I am trying to find a URL containing '.ics' in an href. I tested this code the other day and it was working perfectly, but now when I try to search 'for link in links', 'print link' results in: ``

Skip to main content
Skip to 
main content

Becuase of this, the 'if link.get('href')' code is never satisfied and the URL is not returned. What is causing this, and is there another way to return the URL containing '.ics'?

page = requests.get('https://registrar.fas.harvard.edu/calendar').content
soup = bs4.BeautifulSoup(page, 'lxml')

links = soup.find_all('a')
#print links    
for link in links:
    print link    

    if link.get('href') != None and '.ics' in link.get('href'):
        endout = link.get('href')

        if endout[:6] == 'webcal':
            endout ='https' + endout[6:]
        print
        print 'URL: ' + endout
        print
        return endout
    break

cs95 · Accepted Answer

I would recommend streamlining your search by passing a css href selector and regex pattern:

links = soup.find_all('a', {'href' : re.compile('.*\.ics') })

Output:

[subscribe,
 iCal]

You won't have to jump through hoops to validate your anchor tags now.

Python BeautifulSoup error element invisible when trying to find href?

Answers (1)

Related Questions