En_g_neer
En_g_neer

Reputation: 107

BeautifulSoup .link.get("href") only returns None

Playing around with BeautifulSoup working on my webscraper, and for some reason my links variable returns the blocks of code I specify, but as soon as I try to grab the "href" it only spits out "None".

from bs4 import BeautifulSoup
import requests

r = requests.get("https://www.kickstarter.com/discover/advanced?sort=most_funded")

pageGrab = BeautifulSoup(r.content, "html.parser")

#This comment below is another way I tried
#for link in pageGrab.find_all("div", {"class" : "project-profile-title text-truncate-xs"}):

links = pageGrab.find_all("div", {"class" : "project-profile-title text-truncate-xs"})
for link in links:
    print (link.get("href"))

If I also run this script on say, reddit, for example, there are some links which are grabbed but the vast majority result in "None".

This has been my first target on the page for extracting the "href"

<a target="" href="/projects/getpebble/pebble-time-awesome-smartwatch-no-compromises?ref=most_funded">Pebble Time - Awesome Smartwatch, No Compromises</a>

Upvotes: 4

Views: 4223

Answers (2)

Josh Crozier
Josh Crozier

Reputation: 241238

You are selecting the div elements, which clearly don't have href attributes.

You could simplify your code and use the .select() method and target the children a elements directly:

links = pageGrab.select('.project-profile-title.text-truncate-xs a')
for link in links:
    print (link.get('href'))

Of course you could also use your existing code and chain the .find() method after the div elements; however, that assumes that the div elements will always contain a elements, therefore the code above would be safer to use.

divs = pageGrab.find_all("div", {"class" : "project-profile-title text-truncate-xs"})
for div in divs:
    print (div.find('a').get("href"))

Additionally, if you want to take it a step further, the .select() method accepts a majority of CSS selectors, which means that you could add the [href] attribute selector in order to only select children anchor elements that have href attributes:

links = pageGrab.select('.project-profile-title.text-truncate-xs a[href]')
for link in links:
    print (link.get('href'))

Upvotes: 2

宏杰李
宏杰李

Reputation: 12178

links = pageGrab.find_all("div", {"class" : "project-profile-title text-truncate-xs"})
for link in links:
    print (link.a.get("href"))  # div dose not have href, use div.a find next a tag and get href

out:

/projects/getpebble/pebble-time-awesome-smartwatch-no-compromises?ref=most_funded
/projects/ryangrepper/coolest-cooler-21st-century-cooler-thats-actually?ref=most_funded
/projects/getpebble/pebble-2-time-2-and-core-an-entirely-new-3g-ultra?ref=most_funded
/projects/poots/kingdom-death-monster-15?ref=most_funded
/projects/getpebble/pebble-e-paper-watch-for-iphone-and-android?ref=most_funded
/projects/597538543/the-worlds-best-travel-jacket-with-15-features-bau?ref=most_funded
/projects/elanlee/exploding-kittens?ref=most_funded
/projects/ouya/ouya-a-new-kind-of-video-game-console?ref=most_funded
/projects/peak-design/the-everyday-backpack-tote-and-sling?ref=most_funded
/projects/antsylabs/fidget-cube-a-vinyl-desk-toy?ref=most_funded

Upvotes: 1

Related Questions