Tiago Redaelli
Tiago Redaelli

Reputation: 620

Iteration failure when using BeautifulSoup

I'm using BeautifulSoup to try to extract data from a web page. But for some reason it fails to iterate over items found in season greater than 1. There is seemingly no reason for this behavior as the nodes look exactly the same to me.

def scrape_show(show):
    source = requests.get(show.url).text
    soup = BeautifulSoup(source, 'lxml')

    # All seasons and episodes
    area = soup.find('div', class_='play_video-area-aside play_video-area-aside--related-videos play_video-area-aside--related-videos--titlepage')
    for article in area:
        if "season" in article.get('id'):
            season = article.h2.a.find('span', class_='play_accordion__section-title-inner').text
            print(season + " -- " + article.get('id'))
            # All content for the given season

            ul = article.find('ul')
            if ul is None:
                print("null!")  # This should not happen

Example Output:

Season 1 -- section-season1-xxxx
Season 2 -- section-season2-xxxx
null!

https://www.svtplay.se/andra-aket (url from example)

html source

Upvotes: 0

Views: 65

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195468

The data is not available in HTML form for all seasons, only for season 1. But the information is embedded in the page in JSON form. You can parse this data with re and json module:

import re
import json
import requests

url = 'https://www.svtplay.se/andra-aket?tab=season-1-18927182'

data = json.loads( re.findall(r"root\['__svtplay_apollo'\] = (\{.*?\});", requests.get(url).text)[0] )

from pprint import pprint

# pprint(data) # <-- uncommment this to see all the data

for k in data:
    if k.startswith('Episode:') or (k.startswith('$Episode:') and k.endswith('urls')):
        print(k)
        pprint(data[k])
        print('-' * 80)

Prints (data about episodes 1 and 2 and their URLs):

Episode:1383301-001
{'__typename': 'Episode',
 'accessibilities': {'json': ['AudioDescribed', 'SignInterpreted'],
                     'type': 'json'},
 'duration': 1700,
 'id': '1383301-001',
 'image': {'generated': False,
           'id': 'Image:18926434',
           'type': 'id',
           'typename': 'Image'},
 'live': None,
 'longDescription': 'Madde och Petter flyttar tillsammans med sin 13-åriga '
                    'dotter Ida till Björkfjället, en liten skidort i svenska '
                    'fjällen. Madde är uppvuxen där men för '
                    'Stockholms-hipstern Petter är det ett chockartat '
                    'miljöombyte. Maddes mamma Ingegerd har gått i pension och '
                    'lämnat över ansvaret för familjens lilla hotell till '
                    'Madde. Hon och Petter ska nu driva "Gammelgården" med '
                    'Maddes bror Tommy, vilket visar sig vara en inte helt '
                    'lätt uppgift. I rollerna: Sanna Sundqvist, Jakob '
                    'Setterberg, William Spetz, Bert-Åke Varg, Mattias '
                    'Fransson och Lena T Hansson. Del 1 av 8.',
 'name': 'Avsnitt 1',
 'nameRaw': '',
 'positionInSeason': 'Säsong 1 — Avsnitt 1',
 'restrictions': {'generated': True,
                  'id': '$Episode:1383301-001.restrictions',
                  'type': 'id',
                  'typename': 'Restrictions'},
 'slug': 'avsnitt-1',
 'svtId': 'jBD1gw8',
 'urls': {'generated': True,
          'id': '$Episode:1383301-001.urls',
          'type': 'id',
          'typename': 'Urls'},
 'validFrom': '2019-07-25T02:00:00+02:00',
 'validFromFormatted': 'Tor 25 jul 02:00',
 'validTo': '2020-01-21T23:59:00+01:00',
 'variants': [{'generated': False,
               'id': 'Variant:1383301-001A',
               'type': 'id',
               'typename': 'Variant'},
              {'generated': False,
               'id': 'Variant:1383301-001S',
               'type': 'id',
               'typename': 'Variant'},
              {'generated': False,
               'id': 'Variant:1383301-001T',
               'type': 'id',
               'typename': 'Variant'}],
 'videoSvtId': '8PbQdAj'}
--------------------------------------------------------------------------------
$Episode:1383301-001.urls
{'__typename': 'Urls',
 'svtplay': '/video/19970142/andra-aket/andra-aket-sasong-1-avsnitt-1'}
--------------------------------------------------------------------------------

... and so on.

Upvotes: 2

Related Questions