PieSquare
PieSquare

Reputation: 327

Beautifulsoup: Loop through class and extract information

import requests
from bs4 import BeautifulSoup

res = requests.get('https://www.amcham.com.au/web/Events/Web/Events/Upcoming_Events.aspx?hkey=6f098583-ca3d-4a6f-87de-cd4f13d50b11')
soup = BeautifulSoup(res.text,"lxml")

event_title = soup.find('span', {'class': 'eventTitle'})
event_location = soup.find('span', {'class': 'city'})
event_place =  soup.find('span', {'class': 'place'})
event_date =  soup.find('span', {'class': 'eventDate'})


print(event_title.text)
print(event_place.text,event_location.text)
print(event_date.text)

I'm using this code to extract upcoming events Titles,location,date information from the website.

I'm looking forward to loop throught the entire series on events in the website and extract event title,location,place,date information, Can some one help me with the same?

Upvotes: 1

Views: 5047

Answers (3)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Optimized solution with single select query:

import requests, pprint
import collections
from bs4 import BeautifulSoup

res = requests.get('https://www.amcham.com.au/web/Events/Web/Events/Upcoming_Events.aspx?hkey=6f098583-ca3d-4a6f-87de-cd4f13d50b11')
soup = BeautifulSoup(res.text,"lxml")

event_data = collections.defaultdict(list)
for el in soup.select('span.eventTitle, span.city, span.place, span.eventDate'):
    event_data[el['class'][0]].append(el.text)

pprint.pprint(dict(event_data))

Pretty output:

{'city': ['The Rocks, NSW',
          'Brisbane, QLD',
          'Sydney, NSW',
          'Richmond, VIC',
          'Adelaide, SA',
          'Perth, WA',
          'Melbourne, VIC',
          'Pyrmont, NSW',
          'South Wharf, VIC'],
 'eventDate': ['Mon22Jan',
               'Mon5Feb',
               'Fri9Feb',
               'Tue20Feb',
               'Fri23Feb',
               'Thu8Mar',
               'Wed14Mar',
               'Thu15Mar',
               'Mon19Mar',
               'Tue20Mar',
               'Wed21Mar',
               'Fri23Mar',
               'Wed30May'],
 'eventTitle': ['AMCHAM & UNITED AIRLINES PRESENT',
                'Super Bowl LII',
                'AMCHAM SUPER BOWL LII - SYDNEY',
                'SUPER BOWL LII NETWORKING EVENT',
                'MR SANJEEV GUPTA',
                'Meet the Minister Luncheon with The Hon. Alannah MacTiernan',
                'ADVANCING WOMEN IN LEADERSHIP',
                'GLOBAL CITIZENS: DRIVING THE FUTURE OF EXPERIENCE',
                "INTERNATIONAL WOMEN'S DAY",
                'THE EXECUTIVE SPIN ON SERVICE',
                'TOLL GROUP',
                'THE SCIENCE BEHIND LEADERSHIP',
                'PEAK PERFORMANCE LEADERSHIP SUMMIT',
                'WORLD BUSINESS FORUM: TWO-DAY EVENT'],
 'place': ['Shangri-La Hotel',
           'The Pav Bar',
           'Hotel CBD',
           'Richmond Football Club',
           'InterContinental',
           'CBD Venue',
           'Karstens',
           'Sydney CBD',
           'Plaza Ballroom',
           'RACV Club',
           'The Star',
           'Melbourne Convention Centre']}

Upvotes: 1

SIM
SIM

Reputation: 22440

An alternative approach can be something like below:

import requests
from bs4 import BeautifulSoup

res = requests.get('https://www.amcham.com.au/web/Events/Web/Events/Upcoming_Events.aspx?hkey=6f098583-ca3d-4a6f-87de-cd4f13d50b11')
soup = BeautifulSoup(res.text,"lxml")

for item in soup.find_all(class_=["rgRow","rgAltRow"]):
    event_title = item.find(class_='eventTitle').text
    event_location = item.find(class_='city').text
    event_place =  item.find(class_='place').text
    event_date =  item.find(class_='eventDate').text
    print(event_title,event_location,event_place,event_date)

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71451

You need to use findAll to get a full listing for each:

event_title = [i.text for i in soup.findAll('span', {'class': 'eventTitle'})]
event_location = [i.text for i in soup.findAll('span', {'class': 'city'})]
event_place = [i.text for i in soup.findAll('span', {'class': 'place'})]
event_date =  [i.text for i in soup.findAll('span', {'class': 'eventDate'})]

Upvotes: 3

Related Questions