Reputation: 35
I am stuck with a python programming problem regarding BeautifulSoup.
At first, I needed to create a function that would extract all tags from source page of a webpage. I did this as follows:
from bs4 import BeautifulSoup
soup=BeautifulSoup(''.join(data))
def parseUsingSoup(content):
return soup.findAll('h3')
The website I am trying to parse is this one: http://www.auc.nl/news-events/events-and-lectures/events-and-lectures.html?page=1&pageSize=40
It contained only one h3-tag. Now the problem wants me to extend my function such that it will also return all the content related to it within p-tags. It also asks for a list of the event with four tuples that give the date, the title, the type and the description of the event.
I don't really know how to do this. I tried all kinds of different things, but nothing gives me the right results. Thank you in advance.
Upvotes: 3
Views: 4557
Reputation: 7469
Here is one way you can get all the <p>
tags below the <h3>
:
from bs4 import BeautifulSoup
import urllib2
content = 'http://www.auc.nl/news-events/events-and-lectures/events-and-lectures.html?page=1&pageSize=40'
soup = BeautifulSoup(urllib2.urlopen(content))
for x in soup.findAll('h3'):
for y in soup.findAll('p'):
print y
Then you can parse this output into a list as you see fit.
Upvotes: 4