Finding sibling tag in BeautifulSoup with no attributes

Question

Sorry, kind of a beginner question about BeatifulSoup, but I can't find the answer.

I'm having trouble figuring out how to scrape HTML tags without attributes.

Here's the section of code.


 
  No-Lobbying List
 
 
  
   6/24/2019
  
  
   
    Brian Manley, Chief of Police, Austin Police Department
   
   
    
   
  
  
   
    Preliminary 2018 Annual Crime Report - Executive Summary

How can I navigate to the tag with the text "Preliminary 2018 Annual Crime Report - Executive Summary"?

I have tried moving from a with an attribute and using .next_sibling, but I've failed miserable.

Thank you.

trgrewy = soup.findAll('tr', {'bgcolor':'#efefef'}) #the cells alternate colors
trwhite = soup.findAll('tr', {'bgcolor':'#ffffff'}) 
trs = trgrewy + trwhite #merge them into a list
for item in trs:
    mdate = item.find('td', {'rowspan':'2'}) #find if it's today's date
    if mdate:
        datetime_object = datetime.strptime(mdate.text, '%m/%d/%Y')
        if datetime_object.date() == now.date():
            sender = item.find('a').text
            pdf = item.find('a')['href']
            link = baseurl + pdf
            title = item.findAll('td')[2] #this is where i've failed

Andrej Kesely · Accepted Answer

You can use CSS selectors:

data = '''

 
  No-Lobbying List
 
 
  
   6/24/2019
  
  
   
    Brian Manley, Chief of Police, Austin Police Department
   
   
    
   
  
  
   
    Preliminary 2018 Annual Crime Report - Executive Summary
   
  
 
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')

# This will find date
print(soup.select_one('td[rowspan="2"]').get_text(strip=True))

# This will find next row after the row with date
print(soup.select_one('tr:has(td[rowspan="2"]) + tr').get_text(strip=True))

Prints:

6/24/2019
Preliminary 2018 Annual Crime Report - Executive Summary

Finding sibling tag in BeautifulSoup with no attributes

Answers (2)

Related Questions