Kamikaze_goldfish
Kamikaze_goldfish

Reputation: 861

Python and beautifulsoup - Scrape Text

import requests
from bs4 import BeautifulSoup

url = 'https://www.brightscope.com/401k-rating/240370/Abengoa-Bioenergy-Company-Llc/244317/Abengoa-Bioenergy-Us-401K-Savings-Plan/'
thepage = requests.get(url)
urlsoup = BeautifulSoup(thepage.text, "html.parser")

plandata = urlsoup.find(class_="plans-section").text

print(plandata)

I;m trying to scrape only the class of rating number but when I use this code, I get nothing back :(.

  1. How do I scrape only the class of rating number?
  2. How could I scrape multiple classes (this is the most important part) and put them into a list thats readable?

My thought would be to loop each page scraped and append them to a .csv file with a new line.

Example below;

Rating #1, Company Name1, etc, etc, etc

Rating #2, Company Name2, etc, etc, etc

I just can't get over the hump of figuring this out. Thank for any help!

Edit - The class "plans-section" holds the data that I want but it seems to be broken down to two div tags under it. I want to scrape the data in the class "data-text above-average". The problem is that each page seems to only have the same "data-text" and what comes after changes on each section/page. What options are there for me?

Upvotes: 0

Views: 2498

Answers (2)

宏杰李
宏杰李

Reputation: 12168

import requests
from bs4 import BeautifulSoup


url = 'https://www.brightscope.com/401k-rating/141759/Aj-Kirkwood-Associates-Inc/143902/Aj-Kirkwood-Associates-Inc-401K-Profit-Sharing-Plan/'
thepage = requests.get(url)
urlsoup = BeautifulSoup(thepage.text, "html.parser")

rate = urlsoup.find(class_='rating-number').text
name = urlsoup.find(class_="name").text
print(rate, name)

out:

59 A.J. Kirkwood & Associates, Inc.

use re filter to match all the class contain certain text:

If you pass in a regular expression object, Beautiful Soup will filter against that regular expression using its search() method.

in you case:

import re
soup.find_all(class_=re.compile(r'data-text.+'))

Upvotes: 2

user1211
user1211

Reputation: 1515

What are you exactly wanting to get out of the page? If you are looking to get div by class, this should help.

urlsoup.findAll("div", { "class" :"rating-number"})

Upvotes: 1

Related Questions