Milano
Milano

Reputation: 18735

Different results for the very same method

I'm trying to scrape and parse some data from one web page. The problem is that the script act's different after several attempts.

import mLib
import requests
import urlparse

URL = 'http://www.distrelec.sk/'

class base():

    def __init__(self):
        self.soup = mLib.getSoup(URL)

    def get_info(self,url):

        soup = mLib.getSoup(url)
        up_left_table_outer = soup.find('table',class_='validate-checkbox-group')

        for row_outer in up_left_table_outer.find_all('tr'):
            key_value = row_outer.find_all('td')
            key = key_value[0].label.text
            value = key_value[1].span.text
            yield key,value


bs = base()
for i in range(1,20):
    print dict(bs.get_info('http://www.distrelec.sk/sk/socket-mm-cerna-multi-contact-lb4-black/p/11034944?q=*&filter_Buyable=1&filter_Category3=Laborat%C3%B3rne+konektory&page=1&origPageSize=10&simi=99.8'))

Here is a part of the output (it should be the same each line). As you can see, the key Contact B and Contact A are sometimes different. The problem is probably that there is more than one line of text on the web page:

{u'Barva': u'\u010cern\xe1', u'Jmenovit\xe9 nap\u011bt\xed': u'30 VAC 60 VDC, 20 A', u'Contact A': u'Z\xe1suvka', u'Velikost': u'\xf8 4 mm', u'Contact B': u'M6'}
{u'Barva': u'\u010cern\xe1', u'Jmenovit\xe9 nap\u011bt\xed': u'30 VAC 60 VDC, 20 A', u'Contact A': u'\xf8 4 mm', u'Velikost': u'\xf8 4 mm', u'Contact B': u'P\xe1jen\xed'}
{u'Barva': u'\u010cern\xe1', u'Jmenovit\xe9 nap\u011bt\xed': u'30 VAC 60 VDC, 20 A', u'Contact A': u'Z\xe1suvka', u'Velikost': u'\xf8 4 mm', u'Contact B': u'M6'}
{u'Barva': u'\u010cern\xe1', u'Jmenovit\xe9 nap\u011bt\xed': u'30 VAC 60 VDC, 20 A', u'Contact A': u'\xf8 4 mm', u'Velikost': u'\xf8 4 mm', u'Contact B': u'M6'}
{u'Barva': u'\u010cern\xe1', u'Jmenovit\xe9 nap\u011bt\xed': u'30 VAC 60 VDC, 20 A', u'Contact A': u'Z\xe1suvka', u'Velikost': u'\xf8 4 mm', u'Contact B': u'M6'}
{u'Barva': u'\u010cern\xe1', u'Jmenovit\xe9 nap\u011bt\xed': u'30 VAC 60 VDC, 20 A', u'Contact A': u'Z\xe1suvka', u'Velikost': u'\xf8 4 mm', u'Contact B': u'M6'}
{u'Barva': u'\u010cern\xe1', u'Jmenovit\xe9 nap\u011bt\xed': u'30 VAC 60 VDC, 20 A', u'Contact A': u'\xf8 4 mm', u'Velikost': u'\xf8 4 mm', u'Contact B': u'P\xe1jen\xed'}
{u'Barva': u'\u010cern\xe1', u'Jmenovit\xe9 nap\u011bt\xed': u'30 VAC 60 VDC, 20 A', u'Contact A': u'\xf8 4 mm', u'Velikost': u'\xf8 4 mm', u'Contact B': u'M6'}

Do you have any idea where is the problem?

Upvotes: 0

Views: 46

Answers (1)

Cyphase
Cyphase

Reputation: 12022

The contents of the URL are changing. Try refreshing it a few times.

Compare this:

<code>Zásuvka</code> and <code>Pájení</code> are first

With this:

<code>ø 4 mm</code> and <code>M6</code> are first

Each Contact changes independently, so there are four different combinations in this case.

Upvotes: 1

Related Questions