ag2019
ag2019

Reputation: 105

Fetching web page links from a website and iterating through those links to fetch further information

I am a absolute beginner to Python Programming .I am practicing web scraping on some websites using bs4 module in Python.

Here I want to fetch the links from the website and then iterate through them because when we open each of the links on the website it goes to a new web page from there I want to extract the agent names. Now there are many links, so I have tried to extract them into a list first and then iterate through them. But my list is returning empty list. Kindly tell where I am doing wrong and what should be done.

from bs4 import BeautifulSoup as bs
import pandas as pd

res = requests.get('https://www.mcgrath.com.au/offices', headers = {'User-agent': 'Super Bot 9000'})
soup = bs(res.content, 'lxml')

links = [item['href'] for item in soup.select('.align w-1140 p-none a')]
print(links) ````

Upvotes: 0

Views: 61

Answers (1)

andreihondrari
andreihondrari

Reputation: 5833

You are using the wrong selector. Instead you should be using: .align.w-1140.p-none > a. Like:

links = [item['href'] for item in soup.select('.align.w-1140.p-none > a') if item['href'] != '/']

It's because <div class="align w-1140"> matches with joined CSS classes.

And then to get the agents emails you can do:

res = requests.get('https://www.mcgrath.com.au/offices/178-annerley-yeronga', headers = {'User-agent': 'Super Bot 9000'})
soup = bs(res.content, 'lxml')
agents_mails = [item['href'] for item in soup.select('.agent a[href^=mailto]')]

Upvotes: 1

Related Questions