Reputation: 105
I am a absolute beginner to Python Programming .I am practicing web scraping on some websites using bs4 module in Python.
Here I want to fetch the links from the website and then iterate through them because when we open each of the links on the website it goes to a new web page from there I want to extract the agent names. Now there are many links, so I have tried to extract them into a list first and then iterate through them. But my list is returning empty list. Kindly tell where I am doing wrong and what should be done.
from bs4 import BeautifulSoup as bs
import pandas as pd
res = requests.get('https://www.mcgrath.com.au/offices', headers = {'User-agent': 'Super Bot 9000'})
soup = bs(res.content, 'lxml')
links = [item['href'] for item in soup.select('.align w-1140 p-none a')]
print(links) ````
Upvotes: 0
Views: 61
Reputation: 5833
You are using the wrong selector. Instead you should be using: .align.w-1140.p-none > a
. Like:
links = [item['href'] for item in soup.select('.align.w-1140.p-none > a') if item['href'] != '/']
It's because <div class="align w-1140">
matches with joined CSS classes.
And then to get the agents emails you can do:
res = requests.get('https://www.mcgrath.com.au/offices/178-annerley-yeronga', headers = {'User-agent': 'Super Bot 9000'})
soup = bs(res.content, 'lxml')
agents_mails = [item['href'] for item in soup.select('.agent a[href^=mailto]')]
Upvotes: 1