Reputation: 88
I have an html page in which it has the same set of html codes with different data, i need to get the data "709". I am able to get all the texts inside the tr tag, but i dunno how to get inside of the tr tag and to get the data in the td tag alone. Please help me. Below is the html code.
<table class="readonlydisplaytable">
<tbody>
<tr class="readonlydisplayfield">
<th class="readonlydisplayfieldlabel">Payer Phone #</th>
<td class="readonlydisplayfielddata">1234</td>
</tr>
<tr class="readonlydisplayfield">
<th class="readonlydisplayfieldlabel">Name</th>
<td class="readonlydisplayfielddata">ABC SERVICES</td>
</tr>
<tr class="readonlydisplayfield">
<th class="readonlydisplayfieldlabel">Package #</th>
<td class="readonlydisplayfielddata">709</td>
</tr>
<tr class="readonlydisplayfield">
<th class="readonlydisplayfieldlabel">Case #</th>
<td class="readonlydisplayfielddata">n/a</td>
</tr>
<tr class="readonlydisplayfield">
<th class="readonlydisplayfieldlabel">Date</th>
<td class="readonlydisplayfielddata">n/a</td>
</tr>
<tr class="readonlydisplayfield">
<th class="readonlydisplayfieldlabel">Adjuster</th>
<td class="readonlydisplayfielddata">n/a</td>
</tr>
<tr class="readonlydisplayfield">
<th class="readonlydisplayfieldlabel">Adjuster Phone #</th>
<td class="readonlydisplayfielddata">n/a</td>
</tr>
<tr class="readonlydisplayfield">
<th class="readonlydisplayfieldlabel">Adjuster Fax #</th>
<td class="readonlydisplayfielddata">n/a</td>
</tr>
<tr class="readonlydisplayfield">
<th class="readonlydisplayfieldlabel">Body Part</th>
<td class="readonlydisplayfielddata">n/a</td>
</tr>
<tr class="readonlydisplayfield">
<th class="readonlydisplayfieldlabel">Deadline</th>
<td class="readonlydisplayfielddata">11/22/2014</td>
</tr>
</tbody>
</table>
Below is the code i used.
from selenium import webdriver
import os, time, csv, datetime
from selenium.webdriver.common.keys import Keys
import threading
import multiprocessing
from selenium.webdriver.support.select import Select
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
import openpyxl
from bs4 import BeautifulSoup
import urllib.request
import pandas as pd
soup = BeautifulSoup(open("C:\\Users\\mapraveenkumar\\Desktop\\phonepayor.htm"), "html5lib")
a = soup.find_all("table", class_="readonlydisplaytable")
for b in a:
c = b.find_all("tr", class_="readonlydisplayfield")
for d in c:
if "Package #" in d.get_text():
print(d.get_text())
Upvotes: 0
Views: 2098
Reputation: 21643
You want the text inside the td
element adjacent to the th
element that contains 'Package #'. I begin by looking for that, then I find its parent and the parent's siblings. As usual, I find it easiest to work in an interactive environment when I'm trying to ellucidate how to capture what I want. I suspect that the main point is to use find_all
with string=
.
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(open('temp.htm').read(),'lxml')
>>> target = soup.find_all(string='Package #')
>>> target
['Package #']
>>> target[0].findParent()
<th class="readonlydisplayfieldlabel">Package #</th>
>>> target[0].findParent().fetchNextSiblings()
[<td class="readonlydisplayfielddata">709</td>]
>>> tds = target[0].findParent().fetchNextSiblings()
>>> tds[0].text
'709'
Upvotes: 1
Reputation: 2688
html = '''code above (html'''
soup = bs(html,'lxml')
find_tr = soup.find_all('tr') #Iterates through 'tr'
for i in find_tr:
for j in i.find_all('th'): #iterates through 'th' tags in the 'tr'
print(j)
for k in i.find_all('td'): #iterates through 'td' tags in 'tr'
print(k)
This should do the job. We make a for loop that goes through each TR tag and for EACH value of the tr tag example (we'll make 2 loops that find all th and td tags:
<tr class="readonlydisplayfield">
<th class="readonlydisplayfieldlabel">Payer Phone #</th>
<td class="readonlydisplayfielddata">1234</td>
</tr>
Now this will work also if there is more than 1 td or th tag. For one tag (td,th) use, we can do the following:
find_tr = soup.find_all('tr') #finds all tr
for i in find_tr: #Goes through all tr
print(i.th.text) # the .th will gives us the th tag from one TR
print(i.td.text) # .td will return the td.text value.
Upvotes: 0