Gad
Gad

Reputation: 33

Trouble web scraping on Python

I'm a high school student practicing Python. For a final project, I wanted to use web-scraping (which we haven't covered in class). The following is my code that is supposed to ask a user for their date of birth then print out a list of celebrities that share their birthday (excluding their year of birth).

import requests
from bs4 import BeautifulSoup

print("Please enter your birthday:")
BD_Day = input("Day: ")
BD_Month = input("Month (1-12): ")
Months = ('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')
Month = dict(zip(range(12), Months))
BD_Month = int(BD_Month)
messy_url = ['http://www.famousbirthdays.com/', Month[BD_Month - 1], BD_Day, '.html']
url = ''.join(messy_url)
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
spans = soup.find_all('span', attrs={'class':'title'})
for span in spans:
    print (span.string)

The code is supposed to search the web page defined as 'url', however, it always prints out a list of people born on November 6:

The code also only prints 5/48 names on the page, printing 1-6 (oddly excluding five).

My two main issues are the date and an incomplete list of names-- any input would be appreciated.

Thanks.

Upvotes: 0

Views: 93

Answers (1)

user14002256
user14002256

Reputation:

I would say that your error is coming from the URL, or from the span tags because the website holds all people inside a elements inside div elements.

So, here's how I did it:

import requests
from bs4 import BeautifulSoup

#ask for birthday
print("Please enter your birthday:")
BD_Day = input("Day: ")
BD_Month = input("Month (1-12): ")
Months = ('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')

#make URL
url = "https://www.famousbirthdays.com/" + str(Months[int(BD_Month) - 1].lower() + BD_Day) + ".html"

#make HTTP request
response = requests.get(url=url)

#parse HTML
page = BeautifulSoup(response.content, 'html.parser')

#find list of all people based on website's HTML
all_people = page.find("div",{"class":"people-list"}).find_all("a",{"class":"person-item"})

#show all people
for person in all_people:
    print(person.find("div",{"class":"info"}).find("div",{"class":"name"}).get_text().strip())

I hope I could help!

Upvotes: 1

Related Questions