Reputation: 33
I'm a high school student practicing Python. For a final project, I wanted to use web-scraping (which we haven't covered in class). The following is my code that is supposed to ask a user for their date of birth then print out a list of celebrities that share their birthday (excluding their year of birth).
import requests
from bs4 import BeautifulSoup
print("Please enter your birthday:")
BD_Day = input("Day: ")
BD_Month = input("Month (1-12): ")
Months = ('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')
Month = dict(zip(range(12), Months))
BD_Month = int(BD_Month)
messy_url = ['http://www.famousbirthdays.com/', Month[BD_Month - 1], BD_Day, '.html']
url = ''.join(messy_url)
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
spans = soup.find_all('span', attrs={'class':'title'})
for span in spans:
print (span.string)
The code is supposed to search the web page defined as 'url', however, it always prints out a list of people born on November 6:
The code also only prints 5/48 names on the page, printing 1-6 (oddly excluding five).
My two main issues are the date and an incomplete list of names-- any input would be appreciated.
Thanks.
Upvotes: 0
Views: 93
Reputation:
I would say that your error is coming from the URL, or from the span
tags because the website holds all people inside a
elements inside div
elements.
So, here's how I did it:
import requests
from bs4 import BeautifulSoup
#ask for birthday
print("Please enter your birthday:")
BD_Day = input("Day: ")
BD_Month = input("Month (1-12): ")
Months = ('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')
#make URL
url = "https://www.famousbirthdays.com/" + str(Months[int(BD_Month) - 1].lower() + BD_Day) + ".html"
#make HTTP request
response = requests.get(url=url)
#parse HTML
page = BeautifulSoup(response.content, 'html.parser')
#find list of all people based on website's HTML
all_people = page.find("div",{"class":"people-list"}).find_all("a",{"class":"person-item"})
#show all people
for person in all_people:
print(person.find("div",{"class":"info"}).find("div",{"class":"name"}).get_text().strip())
I hope I could help!
Upvotes: 1