Reputation: 43
I've been writing a function that scrapes posts from the website www.meh.ro. I want it to pull a random post from a random page, but with the way I've built it it scrapes ALL posts by iterating over the html with a for loop, and I just need to return the output from a single post. I've been searching around and breaking my head over a simple solution, but I've got writers block I suppose. I was hoping someone might have a brilliant idea I'm missing.
My code:
from random import randint
from urllib import urlopen
# from urllib import urlretrieve
from bs4 import BeautifulSoup
hit = False
while hit == False:
link = 'http://www.meh.ro/page/' + str(randint(1, 1000))
print link, '\n---\n\n'
try:
source = urlopen(link).read()
soup = BeautifulSoup(source)
for tag in soup.find_all('div'):
try:
if tag['class'][1] == 'post':
# print tag.prettify('utf-8'), '\n\n'
title = tag.h2.a.string
imageURL = tag.p.a['href']
sourceURL = tag.div.a['href'].split('#')[0]
print title
print imageURL
print sourceURL
print '\n'
hit = True
except Exception, e:
if type(e) != 'exceptions.IndexError' or 'exceptions.KeyError':
print 'try2: ',type(e), '\n', e
except Exception, e:
print 'try1: ',type(e), '\n', e
I considered doing it based on an idea I used elsewhere in my code to set the chance a specific entry was chosen, which was to add elements n times to a list in order to increase or decrease the chance of them being pulled from it:
def content_image():
l = []
l.extend(['imgur()' for i in range(90)])
l.extend(['explosm()' for i in range(10)])
return eval(l[randint(0, len(l)-1)])
return out
It would work, but I'm asking around regardless because I'm sure someone more experience than me can work out a better solution.
Upvotes: 1
Views: 181
Reputation: 1122072
To pick one post at random, you still have to loop through all of them and collect them in a list:
import random
posts = []
for tag in soup.find_all('div', class_='post'):
title = tag.h2.a.string
imageURL = tag.p.a['href']
sourceURL = tag.div.a['href'].split('#', 1)[0]
posts.append((title, imageURL, sourceURL))
title, imageURL, sourceURL = random.choice(posts)
This code collects all posts (title, image url, source url) into a list, then use random.choice()
to pick a random entry from that list.
Upvotes: 1