Reputation: 63
Im trying to get some text from http://rss.cnn.com/rss/money_markets.rss
and when I run the code i keep getting the a None
output. If it helps, I trying to get all the small headlines from the web and also the text under them in clear. Thank you!
import requests
import bs4
from bs4 import BeautifulSoup
web = requests.get("http://rss.cnn.com/rss/money_markets.rss")
start = bs4.BeautifulSoup(web.text, 'lxml')
scrape = start.find(".regularitem")
for i in scrape:
print(scrape)
Upvotes: 0
Views: 111
Reputation: 9849
Browser renders the data at http://rss.cnn.com/rss/money_markets.rss
in an user friendly way, i.e. as HTML, but the data itself is actually XML. You can check that out by print(response.headers['content-type'])
which returns 'text/xml; charset=ISO-8859-1'
. Hence, what you are after is item
XML elements. Also, I would suggest using find_all()
to get all the elements instead of find()
which returns just the first one.
import bs4
import requests
response = requests.get("http://rss.cnn.com/rss/money_markets.rss")
soup = bs4.BeautifulSoup(response.text, 'lxml')
for item in soup.find_all("item"):
print(item.title.text)
print(item.description.text)
print("\n")
Upvotes: 1