Nicky
Nicky

Reputation: 39

none returned when trying to get tag value

In this html snippet from https://letterboxd.com/shesnicky/list/top-50-favourite-films/, I'm trying to go through all the different li tags and get the info from 'data-target-link' so I can then use that to create a new link that takes me to the page for that film, however every time I try and get the data it simply returns None or an error along those lines.

<li class="poster-container numbered-list-item" data-owner-rating="10"> <div class="poster film-poster really-lazy-load" data-image-width="125" data-image-height="187" data-film-slug="/film/donnie-darko/" data-linked="linked" data-menu="menu" data-target-link="/film/donnie-darko/" > <img src="https://s3.ltrbxd.com/static/img/empty-poster-125.c6227b2a.png" class="image" width="125" height="187" alt="Donnie Darko"/><span class="frame"><span class="frame-title"></span></span> </div> <p class="list-number">1</p> </li>

I'm going to be using the links to grab imgs for a twitter bot, so I tried doing this within my code:

class BotStreamer(tweepy.StreamListener):

    print "Bot Streamer"
    #on_data method of Tweepy’s StreamListener 
    #passes data from statuses to the on_status method
    def on_status(self, status):
        print "on status"
        link = 'https://letterboxd.com/shesnicky/list/top-50-favourite-films/'
        page = requests.get(link)
        soup = BS(page.content, 'html.parser')
        movies_ul = soup.find('ul', {'class':'poster-list -p125 -grid film-list'})

        movies = []
        for mov in movies_ul.find('data-film-slug'):
            movies.append(mov)

        rand = randint(0,51)
        newLink = "https://letterboxd.com%s" % (str(movies[rand]))
        newPage = requests.get(newLink)
        code = BS(newPage.content, 'html.parser')
        code_div = code.find\
                   ('div', {'class':'react-component film-poster film-poster-51910 poster'})

        image = code_div.find('img')
        url = image.get('src')

        username = status.user.screen_name
        status_id = status.id
        tweet_reply(url, username, status_id)

However, I kept getting errors about list being out of range, or not being able to iterate over NoneType. So I made a test prgrm just to see if I could somehow get the data:

 import requests
 from bs4 import BeautifulSoup as BS

 link = 'https://letterboxd.com/shesnicky/list/top-50-favourite-films/'
 page = requests.get(link)
 soup = BS(page.content, 'html.parser')
 movies_ul = soup.find('ul', {'class':'poster-list -p125 -grid film-list'})
 more = movies_ul.find('li', {'class':'poster-container numbered-list-item'})
 k = more.find('data-target-link')
 print k

And again, all I get is None. Any help greatly appreciated.

Upvotes: 0

Views: 199

Answers (1)

furas
furas

Reputation: 142631

Read doc: find() as first argument expects tag name, not attribute.

You may do

soup.find('div', {'data-target-link': True})

or

soup.find(attrs={'data-target-link': True})

Full example

import requests
from bs4 import BeautifulSoup as BS

link = 'https://letterboxd.com/shesnicky/list/top-50-favourite-films/'
page = requests.get(link)
soup = BS(page.content, 'html.parser')

all_items = soup.find_all('div', {'data-target-link': True})

for item in all_items:
    print(item['data-target-link'])

Upvotes: 1

Related Questions