Reputation: 241
I got the following code:
import urllib
import re
def worldnews():
count = 0
html = urllib.urlopen("https://www.reddit.com/r/worldnews/").readlines()
lines = html
for line in lines:
if "Paris" or "Putin" in line:
count = count + 1
print line
print "Totaal gevonden: ", count
print "----------------------"
worldnews()
How can I find all reddit post on that page with Paris or Puttin in the title. And Is there a way to print this title of the post to the console? When I run this now I get a lot of html code back.
Upvotes: 0
Views: 65
Reputation: 2687
The best way to work with HTML in Python is BeautifulSoup. So, you'll need to download that and look through the documentation to find out how to do exactly what you're asking. However, I got you off to a start:
import urllib
from bs4 import BeautifulSoup
def worldnews():
count = 0
html = urllib.urlopen("https://www.reddit.com/r/worldnews/")
soup = BeautifulSoup(html,"lxml")
titles = soup.find_all('p',{'class':'title'})
for i in titles:
print(i.text)
worldnews()
When this is run, it gives an output that looks like this:
Paris attacks ringleader dead - French officials (bbc.com)
Company which raised price of AIDS drug by 5500% reports $14m quarterly losses. (pinknews.co.uk)
Syria/IraqSyrian man kills judge at ISIS Sharia Court for beheading his brother (en.abna24.com)
Putin Puts $50 Million Bounty on Heads of Metrojet Bombers (fortune.com)
and so on for all the titles on the page. From here you should be able to figure out somewhat easily how to code the rest. :-)
Upvotes: 2