Rutger Semp
Rutger Semp

Reputation: 43

pull specific iteration output from for loop

I've been writing a function that scrapes posts from the website www.meh.ro. I want it to pull a random post from a random page, but with the way I've built it it scrapes ALL posts by iterating over the html with a for loop, and I just need to return the output from a single post. I've been searching around and breaking my head over a simple solution, but I've got writers block I suppose. I was hoping someone might have a brilliant idea I'm missing.

My code:

from random import randint
from urllib import urlopen
# from urllib import urlretrieve
from bs4 import BeautifulSoup


hit = False
while hit == False:
    link = 'http://www.meh.ro/page/' + str(randint(1, 1000))
    print link, '\n---\n\n'

    try:
        source = urlopen(link).read()
        soup = BeautifulSoup(source)

        for tag in soup.find_all('div'):
            try:
                if tag['class'][1] == 'post':
                    # print tag.prettify('utf-8'), '\n\n'
                    title = tag.h2.a.string
                    imageURL = tag.p.a['href']
                    sourceURL = tag.div.a['href'].split('#')[0]

                    print title
                    print imageURL
                    print sourceURL
                    print '\n'
                    hit = True

            except Exception, e:
                if type(e) != 'exceptions.IndexError' or 'exceptions.KeyError':
                    print 'try2: ',type(e), '\n', e

    except Exception, e:
            print 'try1: ',type(e), '\n', e

I considered doing it based on an idea I used elsewhere in my code to set the chance a specific entry was chosen, which was to add elements n times to a list in order to increase or decrease the chance of them being pulled from it:

def content_image():
    l = []
    l.extend(['imgur()' for i in range(90)])
    l.extend(['explosm()' for i in range(10)])

    return eval(l[randint(0, len(l)-1)])
    return out

It would work, but I'm asking around regardless because I'm sure someone more experience than me can work out a better solution.

Upvotes: 1

Views: 181

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122072

To pick one post at random, you still have to loop through all of them and collect them in a list:

import random

posts = []
for tag in soup.find_all('div', class_='post'):
    title = tag.h2.a.string
    imageURL = tag.p.a['href']
    sourceURL = tag.div.a['href'].split('#', 1)[0]

    posts.append((title, imageURL, sourceURL))

title, imageURL, sourceURL = random.choice(posts)

This code collects all posts (title, image url, source url) into a list, then use random.choice() to pick a random entry from that list.

Upvotes: 1

Related Questions