Daveabuk
Daveabuk

Reputation: 43

Learning with Beautifulsoup

I'm trying to pull data form a website and have been looking and trying to learn for weeks. I'm trying

from bs4 import BeautifulSoup as Soup

req = requests.get('http://www.rushmore.tv/schedule')
soup = Soup(req.text, "html.parser")

soup.find('home-section-wrap center', id="section-home")
print soup.find

but it's returning something do to with Steam that's completely random considering that nothing I am doing is related to Steam.

<bound method BeautifulSoup.find of \n<td class="listtable_1" height="16">\n<a href="http://steamcommunity.com/profiles/76561198134729239" target="_blank">\n        76561198134729239\n    </a>\n</td>>

What I'm trying to do is scrape a div ID and print the contents. Extremely new. Cheers

Upvotes: 2

Views: 257

Answers (2)

Keyur Potdar
Keyur Potdar

Reputation: 7238

Use this:

import requests
from bs4 import BeautifulSoup

r = requests.get('http://www.rushmore.tv/schedule')
soup = BeautifulSoup(r.text, "html.parser")

for row in soup.find('ul', id='myUL').findAll('li'):
    print(row.text)

Partial Output:

10:30 - 13:30 Olympics: Women's Curling, Canada vs China (CA Coverage) - Channel 21
10:30 - 11:30 Olympics: Freestyle, Men's Half Pipe (US Coverage) - Channel 34
11:30 - 14:45 Olympics: BBC Coverage - Channel 92
11:30 - 19:30 Olympics: BBC Red Button Coverage - Channel 103
11:30 - 13:30 Olympics: Women's Curling, Great Britain vs Japan - Channel 105
13:00 - 15:30 Olympics: Men's Ice Hockey: Slovenia vs Norway - Channel 11
13:30 - 15:30 Olympics: Men's Ice Hockey: Slovenia vs Norway (JIP) - Channel 21
13:30 - 21:30 Olympics: DE Coverage - Channel 88
14:45 - 18:30 Olympics: BBC Coverage - Channel 91

Upvotes: 2

Alex Bodnya
Alex Bodnya

Reputation: 118

Try to run following code:

import urllib2
from bs4 import BeautifulSoup

quote_page='http://www.rushmore.tv/schedule'
def page_scrapper(quote_page):
    print(quote_page+' is being processed... ')
    page = urllib2.urlopen(quote_page) #Let's open the page...
    soup = BeautifulSoup(page,'html.parser') #And now we parse it with BSoup parser..
    box = soup.find('ul', attrs = {'id': 'myUL'}) #Save the contents of the 'ul' tag with id myUL(it contains schedule)
    print(box) #and print it!
page_scrapper(quote_page)

This should do the trick.

EDIT - added some lines of code

Upvotes: 1

Related Questions