Reputation: 123
I'm trying to scrape a file I have written as a learning experiment. It looks like this:
<div class="container">
<div class="date">1st</div>
<div class="events">
<div class="meeting">
<span class="name">Bob</span>
</div>
</div>
<div class="date">2nd</div>
<div class="event">
<div class="meeting">
<span class="name">Emma</span>
<span class="name">Frank</span>
<span class="name">Charlie</span>
</div>
</div>
<div class="date">3rd</div>
<div class="event">
<div class="meeting">
<span class="name">Lisa</span>
<span class="name">Tony</span>
</div>
</div>
</div>
I would like to scrape the data so it returns the Span with the associated Date. So for example:
data = [['1st', 'bob'], ['2nd', 'Emma', 'Frank' 'Charlie'], ['3rd', 'Lisa', 'Tony']]
The problem I am having is where the Div's date
and event
are on the same level, when I scrape through using the following:
for data in schedule_soup.find_all('div', 'container'):
for date in data.find_all('div', 'date'):
print(date)
for name in data.find_all('span', 'name'):
print(name)
I get this:
<div class="date">1st</div>
<div class="date">2nd</div>
<div class="date">3rd</div>
<span class="name">Bob</span>
<span class="name">Emma</span>
<span class="name">Frank</span>
<span class="name">Charlie</span>
<span class="name">Lisa</span>
<span class="name">Tony</span>
Upvotes: 1
Views: 77
Reputation: 133
you can use zip function:
final_list=[]
dates = soup.find_all('div', 'date')
meetings = soup.find_all('div', 'meeting')
for date1, meeting in zip(dates, meetings):
temp_list=[]
temp_list.append(date1.text)
[temp_list.append(x.text) for x in meeting.find_all('span')]
final_list.append(temp_list)
print (final_list)
Upvotes: 0
Reputation: 2079
Try using the below code, it worked for me
final_list=[]
dates = soup.find_all('div', 'date')
for c in range(len(dates)):
temp_list=[]
temp_list.append(dates[c].text)
meeting = soup.find_all('div', 'meeting')
meeting = BeautifulSoup(str(meeting[c]),'html.parser')
for name in meeting.find_all('span','name'):
temp_list.append(name.text)
final_list.append(temp_list)
print(final_list)
[['1st', 'Bob'], ['2nd', 'Emma', 'Frank', 'Charlie'], ['3rd', 'Lisa', 'Tony']]
Upvotes: 1