Preston G
Preston G

Reputation: 69

Scraping realtor data with beautifulsoup

I was trying to help out some realtor friends by scraping some data off of realtor.com with beautifulsoup.

I am trying to get a list of the names and phone numbers of the realtors but am getting each as a separate item and there are duplicates for every realtor on the page.

This is what I currently have:

from bs4 import BeautifulSoup
import requests
import numpy as np
import pandas as pd

allRealtors = []
pages = np.arange(1, 2, 1)
for page in pages:
    page = requests.get("https://www.realtor.com/realestateagents/New-Orleans_LA/pg-" + str(page))
    soup = BeautifulSoup(page.text, 'html.parser')
    realtors = soup.find_all('div', {"class", ['jsx-1448471805 agent-name text-bold', 'jsx-1448471805 agent-phone hidden-xs hidden-xxs']})
    for item in realtors:
        allRealtors += item
print(allRealtors)

Here are my current results for the list allRealtors:

['Lisa Shedlock', '(504) 330-8233', 'Lisa Shedlock', '(504) 330-8233', 'Heather Laughlin', '(504) 256-6180', 'Heather Laughlin', '(504) 256-6180', 'LIZ ASHE', '(504) 401-4285', 'LIZ ASHE', '(504) 401-4285', 'Richard Haffner', '(504) 456-2961', 'Richard Haffner', '(504) 456-2961', 'Shelly Vallee', '(504) 975-6014', 'Shelly Vallee', '(504) 975-6014', 'Britt Galloway, Agent', '(504) 455-0100', 'Britt Galloway, Agent', '(504) 455-0100', 'Catherine Goens Gerrets, Agent', '(504) 439-8464', 'Catherine Goens Gerrets, Agent', '(504) 439-8464', 'Suzy Lamore', '(504) 729-8818', 'Suzy Lamore', '(504) 729-8818', 'Patti Faulder', '(504) 799-1702', 'Patti Faulder', '(504) 799-1702']

It is creating duplicates for each realtor's name and phone number. Ideally, I would the 2 values to come in as a dictionary like this:

{name:'Lisa Shedlock', number:'(504) 330-8233'; name:'Heather Laughlin', number:'(504) 256-6180'}

And then I would turn that dictionary into a pandas dataframe with the columns name and phone number.

However, this is one of my first times using beautifulsoup, and not sure how to accomplish this. Any suggestions?

Any simpler ways to accomplish this?

Thanks!

Upvotes: 1

Views: 242

Answers (1)

Gealber
Gealber

Reputation: 492

Well you could use selectors in this way

from bs4 import BeautifulSoup
import requests
import numpy as np
import pandas as pd

realtors_data = {}
pages = np.arange(1, 2, 1)
print("PAGES: ", pages)
names_selector = "ul > div > div > div > div > div > a > div"
phone_selectors = "ul > div > div > div > div > div > div.jsx-1448471805.agent-phone.hidden-xs.hidden-xxs"
for page in pages:
    page = requests.get("https://www.realtor.com/realestateagents/New-Orleans_LA/pg-" + str(page))
    soup = BeautifulSoup(page.text, 'html.parser')
    names = soup.select(names_selector)
    phones = soup.select(phone_selectors)

    realtors = zip(names, phones)
    for name, phone in realtors:
        realtors_data[name.get_text()] = phone.get_text()


# Printing data
print(realtors_data)

OUTPUT:

{'Lisa Shedlock': '(504) 330-8233', 'Heather Laughlin': '(504) 256-6180', 'LIZ ASHE': '(504) 401-4285', 'Richard Haffner': '(504) 456-2961', 'Shelly Vallee': '(504) 975-6014', 'Britt Galloway, Agent': '(504) 455-0100', 'Catherine Goens Gerrets, Agent': '(504) 439-8464', 'Suzy Lamore': '(504) 729-8818', 'Patti Faulder': '(504) 799-1702', 'Susan Ann Bourgeois': '(504) 236-7836', 'Lane Washburn': '(504) 909-0824', 'Brandy Dufrene': '(504) 330-2963', 'Claire E Hohensee': '(504) 654-9353', 'Aaron DareTeam': '(504) 899-8666', 'Kara Breithaupt': '(504) 444-6400', 'Joli Tolbert-Burrell': '(504) 982-5654', 'AMANDA MILLER': '(504) 250-0059', 'Carla Lawson': '(504) 329-5164', 'Michael D. Lester': '(504) 559-4652', 'Michael A. Newcomer': '(504) 321-1654'}

Upvotes: 1

Related Questions