Reputation: 43
I am trying to scrape a web page and store the results in a csv/excel file. I am using beautiful soup for this.
I am trying to extract the data from a soup , using the find_all function, but I am not sure how to capture the data in the field name or title
The HTML file has the following format
<h3 class="font20">
<span itemprop="position">36.</span>
<a class="font20 c_name_head weight700 detail_page"
href="/companies/view/1033/nimblechapps-pvt-ltd" target="_blank"
title="Nimblechapps Pvt. Ltd.">
<span itemprop="name">Nimblechapps Pvt. Ltd. </span>
</a> </h3>
This is my code so far. Not sure how to proceed from here
from bs4 import BeautifulSoup as BS
import requests
page = 'https://www.goodfirms.co/directory/platform/app-development/iphone?
page=2'
res = requests.get(page)
cont = BS(res.content, "html.parser")
names = cont.find_all(class_ = 'font20 c_name_head weight700 detail_page')
names = cont.find_all('a' , attrs = {'class':'font20 c_name_head weight700
detail_page'})
I have tried using the following -
Input: cont.h3.a.span
Output: <span itemprop="name">Nimblechapps Pvt. Ltd.</span>
I want to extract the name of the company - "Nimblechapps Pvt. Ltd."
Upvotes: 4
Views: 172
Reputation: 22440
Try not to use compound classes within the script as they are prone to break. The following script should fetch you the required content as well.
import requests
from bs4 import BeautifulSoup
link = "https://www.goodfirms.co/directory/platform/app-development/iphone?page=2"
res = requests.get(link)
soup = BeautifulSoup(res.text, 'html.parser')
for items in soup.find_all(class_="commoncompanydetail"):
names = items.find(class_='detail_page').text
print(names)
Upvotes: 1
Reputation: 84465
Same thing but using descendant combinator " "
to combine the type selector a
with attribute = value selector [itemprop="name"]
names = [item.text for item in cont.select('a [itemprop="name"]')]
Upvotes: 1
Reputation: 8077
You can use a list comprehension for that:
from bs4 import BeautifulSoup as BS
import requests
page = 'https://www.goodfirms.co/directory/platform/app-development/iphone?page=2'
res = requests.get(page)
cont = BS(res.content, "html.parser")
names = cont.find_all('a' , attrs = {'class':'font20 c_name_head weight700 detail_page'})
print([n.text for n in names])
You will get:
['Nimblechapps Pvt. Ltd.', (..) , 'InnoApps Technologies Pvt. Ltd', 'Umbrella IT', 'iQlance Solutions', 'getyoteam', 'JetRuby Agency LTD.', 'ONLINICO', 'Dedicated Developers', 'Appingine', 'webnexs']
Upvotes: 2