Reputation: 27
I want to count all the <a>
tags containing the class name md-headline
and that are located before the link containing the title "Dupont Lewis".
To define the position of the link ("Dupont Lewis") within the page, I am using the following code:
import requests
from bs4 import BeautifulSoup
url = 'https://www.sortlist.fr/pub'
response= requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
print(soup.prettify())
soup.a = soup.find_all("a", {"class": "md-headline"})
search = soup.select_one('a[title*="Dupont Lewis"]')
if search:
position = find_all_previous('a[title*="Dupont Lewis"]')
print(position.count)
else:
print('None')
But I keep on getting 0 for some reason.
Upvotes: 0
Views: 578
Reputation: 24134
link = soup.select_one('a[title*="Dupont Lewis"]')
previous_md_headlines = link.find_all_previous("a", {"class": "md-headline"})
link = soup.select_one('a[title*="Dupont Lewis"]')
next_md_headlines = link.find_all_next("a", {"class": "md-headline"})
md-headline
prior to the link with title "Dupont Lewis"?On the web page "https://www.sortlist.fr/pub", the first anchor element with class md-headline
also happens to be the same anchor element with the title "Dupont Lewis", which is why the previous element count will always be zero (unless the web page changes).
import requests
from bs4 import BeautifulSoup
url = 'https://www.sortlist.fr/pub'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
link = soup.select_one('a[title*="Dupont Lewis"]')
print(f"link: {link}")
previous_md_headlines = link.find_all_previous("a", {"class": "md-headline"})
next_md_headlines = link.find_all_next("a", {"class": "md-headline"})
print(f"\n\nFound {len(previous_md_headlines)} previous md-headlines.")
print("Previous md-headline links:\n")
print(*previous_md_headlines, sep="\n\n")
print(f"Found {len(next_md_headlines)} next md-headlines.")
print("Next md-headline links:\n")
print(*next_md_headlines, sep="\n\n")
link: <a class="s-block s-bold md-headline md-padding s-pb0 md-truncate" ng-click='setExpertiseAndLocation({"expertise":{"id":84,"name":"Publicité","title":"Agences de Publicité","slug":"pub","imageUrl":"/images/expertises/84.jpg"}})' sl-link="xx-L2FnZW5jeS9kdXBvbnQtbGV3aXM=" target="_blank" title="Dupont Lewis">Dupont Lewis</a>
Found 0 previous md-headlines.
Previous md-headline links:
Found 49 next md-headlines.
Next md-headline links:
<a class="s-block s-bold md-headline md-padding s-pb0 md-truncate" ng-click='setExpertiseAndLocation({"expertise":{"id":84,"name":"Publicité","title":"Agences de Publicité","slug":"pub","imageUrl":"/images/expertises/84.jpg"}})' sl-link="xx-L2FnZW5jeS9jb25jZXB0b3J5LTVmMjliMzFhLWExY2YtNDRlYS1iYzA4LWJiMzg2MTkyMmM1OQ==" target="_blank" title="The Collective Story">The Collective Story</a>
<a class="s-block s-bold md-headline md-padding s-pb0 md-truncate" ng-click='setExpertiseAndLocation({"expertise":{"id":84,"name":"Publicité","title":"Agences de Publicité","slug":"pub","imageUrl":"/images/expertises/84.jpg"}})' sl-link="xx-L2FnZW5jeS90aGUtY3Jldw==" target="_blank" title="The Crew Communication">The Crew Communication</a>
<a class="s-block s-bold md-headline md-padding s-pb0 md-truncate" ng-click='setExpertiseAndLocation({"expertise":{"id":84,"name":"Publicité","title":"Agences de Publicité","slug":"pub","imageUrl":"/images/expertises/84.jpg"}})' sl-link="xx-L2FnZW5jeS9ub3ZlbWJyZQ==" target="_blank" title="Novembre - Creative Business Partner">Novembre - Creative Business Partner</a>
...
Upvotes: 2