Reputation: 6965
This should be fairly straightforward. I want to count the links created from a search on a webpage. In this example, search for "gwen stefani" on Stack Overflow. As of the time of writing, the number of results is 15.
import bs4 # beautiful soup 4
import requests
import webbrowser
url = "https://stackoverflow.com/search?q=gwen+stefani"
myURL = url
webbrowser.open(myURL)
page = requests.get(url).text
r = requests.get(myURL)
html_content = r.text
soup = bs4.BeautifulSoup(html_content, "html.parser")
print soup.title
for link in soup.find_all("a"):
print(link.get("href"))
When the links are printed out, it doesn't contain any of the results mentioned. I'm new to the soup, and I'm not sure where I'm going wrong at this point.
Upvotes: 3
Views: 1702
Reputation: 8740
You can also try below code where you do not need to use the class of div
element.
Just inspect the page and find the class of question's link.
import bs4 # beautiful soup 4
import requests
import webbrowser
import json
url = "https://stackoverflow.com/search?q=gwen+stefani"
webbrowser.open(url)
r = requests.get(url)
html_content = r.text
# with open('response.html', 'w', encoding="utf-8") as f:
# f.write(html_content)
soup = bs4.BeautifulSoup(html_content, "html.parser")
print(soup.title)
links = soup.find_all("a", class_='question-hyperlink')
valid_links = {}
for i, link in enumerate(links):
href = link.get('href').strip()
if href.startswith('/questions/'):
valid_links[href] = link.text.strip()
print(json.dumps(valid_links, indent=4)) # pretty printing dictionary
print(len(valid_links)) # 15
Output
<title>Posts containing 'gwen stefani' - Stack Overflow</title>
{
"/questions/39268369/what-does-minus-minus-do-in-excel": "Q: What does \u2014 (minus minus) do in Excel? [duplicate]",
"/questions/53264513/using-beautiful-soup-to-count-links-on-requested-page": "Q: Using Beautiful Soup to count links on requested page",
"/questions/31074289/is-there-a-script-that-can-transfer-text-from-an-excel-file-into-an-adobe-design/31100563#31100563": "A: Is there a script that can transfer text from an excel file into an Adobe design program?",
"/questions/39268369/what-does-minus-minus-do-in-excel/39268800#39268800": "A: What does \u2014 (minus minus) do in Excel?",
"/questions/1668447/launch-failed-binary-not-found-snow-leopard-and-eclipse-c-c-ide-issue/8463357#8463357": "A: \u201cLaunch Failed. Binary Not Found.\u201d Snow Leopard and Eclipse C/C++ IDE issue",
"/questions/33023818/split-and-rejoin-path-without-trailing-backslash": "Q: Split and rejoin path without trailing backslash",
"/questions/36986461/regex-match-return-remaining-rest-of-string": "Q: Regex match, return remaining rest of string",
"/questions/44686123/pass-variable-from-javascript-to-windows-batch-file": "Q: Pass variable from JavaScript to Windows batch file",
"/questions/44686123/pass-variable-from-javascript-to-windows-batch-file/44686309#44686309": "A: Pass variable from JavaScript to Windows batch file",
"/questions/52465425/reversing-a-list-with-single-element-gives-none": "Q: Reversing a list with single element gives None [duplicate]",
"/questions/22196612/array-length-outside-of-a-method": "Q: Array length outside of a method",
"/questions/13300815/not-getting-expected-results-from-select-query/13300920#13300920": "A: Not getting expected results from SELECT query",
"/questions/32884087/slicing-string-from-start": "Q: Slicing string from start [duplicate]",
"/questions/53264513/using-beautiful-soup-to-count-links-on-requested-page/53265048#53265048": "A: Using Beautiful Soup to count links on requested page",
"/questions/23337218/recursive-conditions-missing-base-case": "Q: Recursive conditions - missing base case"
}
15
Upvotes: 0
Reputation: 861
I'm using python 3.x so you might have to adjust for that but I am getting all 15 links.
from bs4 import BeautifulSoup
import requests
url = 'https://stackoverflow.com/search?q=gwen+stefani'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'hmtl.parser')
for link in soup.findAll('div', class_='result-link'):
print('https://stackoverflow.com'+link.a['href'])
Upvotes: 2