Extract text from nested tags inside another nested tags using beautifulsoup in python3

Question

I have an html page in which it has the same set of html codes with different data, i need to get the data "709". I am able to get all the texts inside the tr tag, but i dunno how to get inside of the tr tag and to get the data in the td tag alone. Please help me. Below is the html code.


	
		
			Payer Phone #
			1234
		
		
			Name
			ABC SERVICES
		
		
		
			Package #
			709
		
		
		
			Case #
			n/a
		
		
		
			Date
			n/a
		
		
		
			Adjuster
			n/a
		
		
		
			Adjuster Phone #
			n/a
		
		
		
			Adjuster Fax #
			n/a
		
		
		
			Body Part
			n/a
		
		
		
			Deadline
			11/22/2014

Below is the code i used.

from selenium import webdriver
import os, time, csv, datetime
from selenium.webdriver.common.keys import Keys
import threading
import multiprocessing
from selenium.webdriver.support.select import Select
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
import openpyxl
from bs4 import BeautifulSoup
import urllib.request
import pandas as pd


soup = BeautifulSoup(open("C:\Users\mapraveenkumar\Desktop\phonepayor.htm"), "html5lib")
a = soup.find_all("table", class_="readonlydisplaytable")
for b in a:
    c = b.find_all("tr", class_="readonlydisplayfield")
    for d in c:
        if "Package #" in d.get_text():
            print(d.get_text())

Bill Bell · Accepted Answer

You want the text inside the td element adjacent to the th element that contains 'Package #'. I begin by looking for that, then I find its parent and the parent's siblings. As usual, I find it easiest to work in an interactive environment when I'm trying to ellucidate how to capture what I want. I suspect that the main point is to use find_all with string=.

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(open('temp.htm').read(),'lxml')
>>> target = soup.find_all(string='Package #')
>>> target
['Package #']
>>> target[0].findParent()
Package #
>>> target[0].findParent().fetchNextSiblings()
[709]
>>> tds = target[0].findParent().fetchNextSiblings()
>>> tds[0].text
'709'

Extract text from nested tags inside another nested tags using beautifulsoup in python3

Answers (2)

Related Questions