Reputation: 151
The code for the website is here: https://i.sstatic.net/FEIAa.png
The code I am using:
import requests
import time
from bs4 import BeautifulSoup
import sys
sys.stdout = open("links.txt", "a")
for x in range(0, 2):
try:
URL = f'https://link.com/{x}'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
rows = soup.find_all('div', id='view')
for row in rows:
print(row.text)
time.sleep(5)
except:
continue
I just want an output of the list of links as shown in the highlighted code. But instead it results in the entire view code, not just the HREF, which is what I want.
Example of output that it produces:
<div id="view">
<a href="/watch/8f310ba6dfsdfsdfsdf" target="_blank"><img src="/thumbs/jpg/8f310ba6dfsdfsdfsdf.jpg" width="300"/></a>
...
...
When what I want it to produce is:
/watch/8f310ba6dfsdfsdfsdf
...
...
Upvotes: 1
Views: 53
Reputation: 33384
Use following code which will find all anchor tag under div tag and then get the href
value.
soup = BeautifulSoup(page.content, 'html.parser')
for links in soup.find('div',id='view').find_all('a'):
print(links['href'])
If you Bs4 4.7.1 or above you can use following css selector.
soup = BeautifulSoup(page.content, 'html.parser')
for links in soup.select('#view>a'):
print(links['href'])
Upvotes: 2
Reputation: 581
By extracting the href
attribute of the a
inside the div
you can get your desired result
rows = soup.find_all('div', id='view')
for row in rows:
links = row.find_all('a')
for link in links:
print(link['href'])
Upvotes: 0
Reputation: 1710
You are retrieving the whole content of the div
tag so if you want to get the links within the div then you need to add the a
tag to the css seelctor as follows :
links = soup.select('div[id="view"] a')
for link in links :
print(link.get('href'))
Upvotes: 0