JhonSnow
JhonSnow

Reputation: 27

How to get count of tag with a specific class before a certain element using Beautiful Soup?

I want to count all the <a> tags containing the class name md-headline and that are located before the link containing the title "Dupont Lewis".

To define the position of the link ("Dupont Lewis") within the page, I am using the following code:

import requests
from bs4 import BeautifulSoup

url = 'https://www.sortlist.fr/pub'
response= requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")
print(soup.prettify())

soup.a = soup.find_all("a", {"class": "md-headline"})
search = soup.select_one('a[title*="Dupont Lewis"]')
if search:
    position = find_all_previous('a[title*="Dupont Lewis"]')
    print(position.count)
else:
    print('None')

But I keep on getting 0 for some reason.

Upvotes: 0

Views: 578

Answers (1)

Christopher Peisert
Christopher Peisert

Reputation: 24134

Find all previous elements

link = soup.select_one('a[title*="Dupont Lewis"]')
previous_md_headlines = link.find_all_previous("a", {"class": "md-headline"})

Find all next elements

link = soup.select_one('a[title*="Dupont Lewis"]')
next_md_headlines = link.find_all_next("a", {"class": "md-headline"})

On the web page "https://www.sortlist.fr/pub", the first anchor element with class md-headline also happens to be the same anchor element with the title "Dupont Lewis", which is why the previous element count will always be zero (unless the web page changes).

Complete Example

import requests
from bs4 import BeautifulSoup

url = 'https://www.sortlist.fr/pub'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

link = soup.select_one('a[title*="Dupont Lewis"]')
print(f"link: {link}")
previous_md_headlines = link.find_all_previous("a", {"class": "md-headline"})
next_md_headlines = link.find_all_next("a", {"class": "md-headline"})

print(f"\n\nFound {len(previous_md_headlines)} previous md-headlines.")
print("Previous md-headline links:\n")
print(*previous_md_headlines, sep="\n\n")

print(f"Found {len(next_md_headlines)} next md-headlines.")
print("Next md-headline links:\n")
print(*next_md_headlines, sep="\n\n")

Output

link: <a class="s-block s-bold md-headline md-padding s-pb0 md-truncate" ng-click='setExpertiseAndLocation({"expertise":{"id":84,"name":"Publicité","title":"Agences de Publicité","slug":"pub","imageUrl":"/images/expertises/84.jpg"}})' sl-link="xx-L2FnZW5jeS9kdXBvbnQtbGV3aXM=" target="_blank" title="Dupont Lewis">Dupont Lewis</a>


Found 0 previous md-headlines.
Previous md-headline links:

Found 49 next md-headlines.
Next md-headline links:

<a class="s-block s-bold md-headline md-padding s-pb0 md-truncate" ng-click='setExpertiseAndLocation({"expertise":{"id":84,"name":"Publicité","title":"Agences de Publicité","slug":"pub","imageUrl":"/images/expertises/84.jpg"}})' sl-link="xx-L2FnZW5jeS9jb25jZXB0b3J5LTVmMjliMzFhLWExY2YtNDRlYS1iYzA4LWJiMzg2MTkyMmM1OQ==" target="_blank" title="The Collective Story">The Collective Story</a>

<a class="s-block s-bold md-headline md-padding s-pb0 md-truncate" ng-click='setExpertiseAndLocation({"expertise":{"id":84,"name":"Publicité","title":"Agences de Publicité","slug":"pub","imageUrl":"/images/expertises/84.jpg"}})' sl-link="xx-L2FnZW5jeS90aGUtY3Jldw==" target="_blank" title="The Crew Communication">The Crew Communication</a>

<a class="s-block s-bold md-headline md-padding s-pb0 md-truncate" ng-click='setExpertiseAndLocation({"expertise":{"id":84,"name":"Publicité","title":"Agences de Publicité","slug":"pub","imageUrl":"/images/expertises/84.jpg"}})' sl-link="xx-L2FnZW5jeS9ub3ZlbWJyZQ==" target="_blank" title="Novembre - Creative Business Partner">Novembre - Creative Business Partner</a>
...

Upvotes: 2

Related Questions