YasserKhalil
YasserKhalil

Reputation: 9538

Get NextSibling in BeautifulSoup in python

I am trying to get the links from a webpage and I have succeeded to get the href of the image next to the needed links but when trying to use next_sibling, I got None .. Here's my try

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}

response = requests.get('http://index-of.es/Python/', headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
#print(soup.select('a img'))
for item in soup.select("a img"):
    print(item.next_sibling)

The code works if I used print(item) but when trying to catch the next sibling, it doesn't work for me Any ideas.

Upvotes: 2

Views: 165

Answers (2)

QHarr
QHarr

Reputation: 84465

You are walking the wrong way. The link is associated with the parent node of the img. I would also use a more selective css selector to get the right img nodes

for item in soup.select("[alt='[   ]']"):
    print('http://index-of.es/Python/' + item.parent['href'])

Of course, if you don't care about the img then use :has (bs4 4.7.1+) to specify the parent a has a child with particular alt value:

print(['http://index-of.es/Python/' + i['href'] for i in soup.select("a:has([alt='[   ]'])")])

Upvotes: 1

YasserKhalil
YasserKhalil

Reputation: 9538

I have searched a lot till I could figure it out like that

for item in soup.select("a img"):
    try:
        if item.find_next('a')['href'][0] != '/':
            print('http://index-of.es/Python/' + item.find_next('a')['href'])
    except:
        pass

Upvotes: 0

Related Questions