Reputation: 861
import requests
from bs4 import BeautifulSoup
url = 'https://www.brightscope.com/401k-rating/240370/Abengoa-Bioenergy-Company-Llc/244317/Abengoa-Bioenergy-Us-401K-Savings-Plan/'
thepage = requests.get(url)
urlsoup = BeautifulSoup(thepage.text, "html.parser")
plandata = urlsoup.find(class_="plans-section").text
print(plandata)
I;m trying to scrape only the class of rating number but when I use this code, I get nothing back :(.
My thought would be to loop each page scraped and append them to a .csv file with a new line.
Example below;
Rating #1, Company Name1, etc, etc, etc
Rating #2, Company Name2, etc, etc, etc
I just can't get over the hump of figuring this out. Thank for any help!
Edit - The class "plans-section" holds the data that I want but it seems to be broken down to two div tags under it. I want to scrape the data in the class "data-text above-average". The problem is that each page seems to only have the same "data-text" and what comes after changes on each section/page. What options are there for me?
Upvotes: 0
Views: 2498
Reputation: 12168
import requests
from bs4 import BeautifulSoup
url = 'https://www.brightscope.com/401k-rating/141759/Aj-Kirkwood-Associates-Inc/143902/Aj-Kirkwood-Associates-Inc-401K-Profit-Sharing-Plan/'
thepage = requests.get(url)
urlsoup = BeautifulSoup(thepage.text, "html.parser")
rate = urlsoup.find(class_='rating-number').text
name = urlsoup.find(class_="name").text
print(rate, name)
out:
59 A.J. Kirkwood & Associates, Inc.
use re
filter to match all the class contain certain text:
If you pass in a regular expression object, Beautiful Soup will filter against that regular expression using its search() method.
in you case:
import re
soup.find_all(class_=re.compile(r'data-text.+'))
Upvotes: 2
Reputation: 1515
What are you exactly wanting to get out of the page? If you are looking to get div by class, this should help.
urlsoup.findAll("div", { "class" :"rating-number"})
Upvotes: 1