Reputation: 9538
I am trying to get the links from a webpage and I have succeeded to get the href of the image next to the needed links but when trying to use next_sibling
, I got None ..
Here's my try
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
response = requests.get('http://index-of.es/Python/', headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
#print(soup.select('a img'))
for item in soup.select("a img"):
print(item.next_sibling)
The code works if I used print(item)
but when trying to catch the next sibling, it doesn't work for me
Any ideas.
Upvotes: 2
Views: 165
Reputation: 84465
You are walking the wrong way. The link is associated with the parent node of the img. I would also use a more selective css selector to get the right img nodes
for item in soup.select("[alt='[ ]']"):
print('http://index-of.es/Python/' + item.parent['href'])
Of course, if you don't care about the img
then use :has
(bs4 4.7.1+) to specify the parent a
has a child with particular alt
value:
print(['http://index-of.es/Python/' + i['href'] for i in soup.select("a:has([alt='[ ]'])")])
Upvotes: 1
Reputation: 9538
I have searched a lot till I could figure it out like that
for item in soup.select("a img"):
try:
if item.find_next('a')['href'][0] != '/':
print('http://index-of.es/Python/' + item.find_next('a')['href'])
except:
pass
Upvotes: 0