Reputation: 37
I wanted to scrape something as my first program, just to learn the basics really but I'm having trouble showing more than one result.
The premise is going to a forum (http://blackhatworld.com), scrape all thread titles and compare with a string. If it contains the word "free" it will print, otherwise it won't.
Here's the current code:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.blackhatworld.com/')
content = BeautifulSoup(page.content, 'html.parser')
threadtitles = content.find_all('a', class_='PreviewTooltip')
n=0
for x in range(len(threadtitles)):
test = list(threadtitles)[n]
test2 = list(test)[0]
if test2.find('free') == -1:
n=n+1
else:
print(test2)
n=n+1
This is the result of running the program: https://i.gyazo.com/6cf1e135b16b04f0807963ce21b2b9be.png
As you can see it's checking for the word "free" and it works but it only shows first result while there are several more in the page.
Upvotes: 0
Views: 72
Reputation: 10184
To solve your problem and simplify your code try this:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.blackhatworld.com/')
content = BeautifulSoup(page.content, 'html.parser')
threadtitles = content.find_all('a', class_='PreviewTooltip')
count = 0
for title in threadtitles:
if "free" in title.get_text().lower():
print(title.get_text())
else:
count += 1
print(count)
Bonus: Print value of href
:
for title in threadtitles:
print(title["href"])
See also this.
Upvotes: 1
Reputation: 472
By default, strings comparison is case sensitive (FREE != free
). To solve your problem, first you need to put test2
in lowercase:
test2 = list(test)[0].lower()
Upvotes: 1