BeautifulSoup HTML Table Parsing for the tags without classes

Question

I have this html table: I need to get specific data from this table and assign it to a variable, I do not need all the information. flag = "United Arab Emirates", home_port="Sharjah" etc. Since there are no 'class' on html elements, how do we extract this data.

        r = requests.get('http://maritime-connector.com/ship/'+str(imo_number),  headers={'User-Agent': 'Mozilla/5.0'})
    soup = BeautifulSoup(r.content, "lxml")
    table = soup.find("table", { "class" : "ship-data-table" })
    for row in table.findAll("tr"):
        tname = row.findAll("th")
        cells = row.findAll("td")


        print (type(tname))
        print (type(cells))

I am using the python module beautfulSoup.


                        
                        
                            IMO number
                            9492749
                        
                        
                            Name of the ship
                            SHARIEF PILOT
                        
                                                    
                            Type of ship
                            ANCHOR HANDLING VESSEL
                        
                                                                                
                            MMSI
                            470535000
                        
                                                                                
                            Gross tonnage
                            499 tons
                        
                                                                                
                            DWT
                            222 tons
                        
                                                                                
                            Year of build
                            2008
                        
                                                                                
                            Builder
                            NANYANG SHIPBUILDING - JINGJIANG, CHINA
                        
                                                                                
                            Flag
                            UNITED ARAB EMIRATES
                        
                                                                                                            
                            Home port
                            SHARJAH
                        
                                                                                                            
                            Manager & owner
                            GLOBAL MARINE SERVICES - SHARJAH, UNITED ARAB EMIRATES
                        
                                                                                                                                        
                            Former names
                            SUPERIOR PILOT until 2008 Sep

alecxe · Accepted Answer

Go over all the th elements in the table, get the text and the following td sibling's text:

from pprint import pprint

from bs4 import BeautifulSoup

data = """your HTML here"""

soup = BeautifulSoup(data, "html.parser")

result = {header.get_text(strip=True): header.find_next_sibling("td").get_text(strip=True)
          for header in soup.select("table.ship-data-table tr th")}
pprint(result)

This would construct a nice dictionary with headers as keys and corresponding td texts as values:

{'Builder': 'NANYANG SHIPBUILDING - JINGJIANG, CHINA',
 'DWT': '222 tons',
 'Flag': 'UNITED ARAB EMIRATES',
 'Former names': 'SUPERIOR PILOT until 2008 Sep',
 'Gross tonnage': '499 tons',
 'Home port': 'SHARJAH',
 'IMO number': '9492749',
 'MMSI': '470535000',
 'Manager & owner': 'GLOBAL MARINE SERVICES - SHARJAH, UNITED ARAB EMIRATES',
 'Name of the ship': 'SHARIEF PILOT',
 'Type of ship': 'ANCHOR HANDLING VESSEL',
 'Year of build': '2008'}

BeautifulSoup HTML Table Parsing for the tags without classes

Answers (2)

Related Questions