MIGUEL GP
MIGUEL GP

Reputation: 63

Web scraping with BeautifulSoup Python returns None

Im trying to get some text from http://rss.cnn.com/rss/money_markets.rss and when I run the code i keep getting the a None output. If it helps, I trying to get all the small headlines from the web and also the text under them in clear. Thank you!

import requests
import bs4
from bs4 import BeautifulSoup
web = requests.get("http://rss.cnn.com/rss/money_markets.rss")
start = bs4.BeautifulSoup(web.text, 'lxml')
scrape = start.find(".regularitem")
for i in scrape:
    print(scrape)

Upvotes: 0

Views: 111

Answers (1)

Dušan Maďar
Dušan Maďar

Reputation: 9849

Browser renders the data at http://rss.cnn.com/rss/money_markets.rss in an user friendly way, i.e. as HTML, but the data itself is actually XML. You can check that out by print(response.headers['content-type']) which returns 'text/xml; charset=ISO-8859-1'. Hence, what you are after is item XML elements. Also, I would suggest using find_all() to get all the elements instead of find() which returns just the first one.

import bs4
import requests

response = requests.get("http://rss.cnn.com/rss/money_markets.rss")
soup = bs4.BeautifulSoup(response.text, 'lxml')
for item in soup.find_all("item"):
    print(item.title.text)
    print(item.description.text)
    print("\n")

Upvotes: 1

Related Questions