leopheard
leopheard

Reputation: 101

Unsure why beautifulsoup code won't scrape site

I've used BS a fair bit, but I'm unsure why this won't scrape as the other addons I've made for Kodi work fine. Could someone perhaps look at the code between the tags and perhaps find the bit I'm missing?

The addon/python doesn't throw out any error, it just provides an empty GUI screen. If the title or image scraping were fine and the link wasn't, then it would show a title/image but the link wouldn't work when clicked. So it's obviously the title/image part. I've even tried hashing out the image section so it just looks for a link and title, but still nothing.

Link being scraped: https://store.counterpunch.org/feed/podcast/

def get_soup1(url1):
    page = requests.get(url1)
    soup1 = BeautifulSoup(page.text, 'html.parser')
    print("type: ", type(soup1))
    return soup1
get_soup1("https://store.counterpunch.org/feed/podcast/")

def get_playable_podcast1(soup1):
    subjects = []
    for content in soup1.find_all('item', limit=9):
        try:        
            link = content.find('enclosure')
            link = link.get('url')
            print("\n\nLink: ", link)
            title = content.find('title')
            title = title.get_text()
        except AttributeError:
            continue
        item = {
                'url': link,
                'title': title,
                'thumbnail': "https://is2-ssl.mzstatic.com/image/thumb/Podcasts71/v4/71/55/88/71558834-c449-9ac3-e327-cad002e305b4/mza_4409042347411679857.jpg/600x600bb.jpg",
        }
        subjects.append(item)
    return subjects
def compile_playable_podcast1(playable_podcast1):
    items = []
    for podcast in playable_podcast1:
        items.append({
            'label': podcast['title'],
            'thumbnail': podcast['thumbnail'],
            'path': podcast['url'],
            'is_playable': True,
    })
    return items

Upvotes: 1

Views: 64

Answers (1)

QHarr
QHarr

Reputation: 84465

You need a User-Agent

def get_soup1(url1):
    page = requests.get(url1, headers = {'User-Agent':'Mozilla/5.0'})
    soup1 = BeautifulSoup(page.text, 'html.parser')
    print("type: ", type(soup1))
    return soup1

Upvotes: 1

Related Questions