Reputation: 43

How to extract data from HTML using beuatiful soup

I am trying to scrape a web page and store the results in a csv/excel file. I am using beautiful soup for this.

I am trying to extract the data from a soup , using the find_all function, but I am not sure how to capture the data in the field name or title

The HTML file has the following format

<h3 class="font20">
 <span itemprop="position">36.</span> 
 <a class="font20 c_name_head weight700 detail_page" 
 href="/companies/view/1033/nimblechapps-pvt-ltd" target="_blank" 
 title="Nimblechapps Pvt. Ltd."> 
     <span itemprop="name">Nimblechapps Pvt. Ltd. </span>
</a> </h3>

This is my code so far. Not sure how to proceed from here

from bs4 import BeautifulSoup as BS
import requests 
page = 'https://www.goodfirms.co/directory/platform/app-development/iphone? 
page=2'
res = requests.get(page)
cont = BS(res.content, "html.parser")
names = cont.find_all(class_ = 'font20 c_name_head weight700 detail_page')
names = cont.find_all('a' , attrs = {'class':'font20 c_name_head weight700 
detail_page'})

I have tried using the following -

Input: cont.h3.a.span
Output: <span itemprop="name">Nimblechapps Pvt. Ltd.</span>

I want to extract the name of the company - "Nimblechapps Pvt. Ltd."

Upvotes: 4

Answers (3)

SIM

Reputation: 22440

Try not to use compound classes within the script as they are prone to break. The following script should fetch you the required content as well.

import requests
from bs4 import BeautifulSoup

link = "https://www.goodfirms.co/directory/platform/app-development/iphone?page=2"

res = requests.get(link)
soup = BeautifulSoup(res.text, 'html.parser')
for items in soup.find_all(class_="commoncompanydetail"):
    names = items.find(class_='detail_page').text
    print(names)

Upvotes: 1

QHarr

Reputation: 84465

Same thing but using descendant combinator " " to combine the type selector a with attribute = value selector [itemprop="name"]

names = [item.text for item in cont.select('a [itemprop="name"]')]

Upvotes: 1

drec4s

Reputation: 8077

You can use a list comprehension for that:

from bs4 import BeautifulSoup as BS
import requests

page = 'https://www.goodfirms.co/directory/platform/app-development/iphone?page=2'
res = requests.get(page)
cont = BS(res.content, "html.parser")
names = cont.find_all('a' , attrs = {'class':'font20 c_name_head weight700 detail_page'})
print([n.text for n in names])

You will get:

['Nimblechapps Pvt. Ltd.', (..) , 'InnoApps Technologies Pvt. Ltd', 'Umbrella IT', 'iQlance Solutions', 'getyoteam', 'JetRuby Agency LTD.', 'ONLINICO', 'Dedicated Developers', 'Appingine', 'webnexs']

Upvotes: 2

How to extract data from HTML using beuatiful soup

Answers (3)

Related Questions