James Cook
James Cook

Reputation: 344

Find specific text using BeautifulSoup

I need to find text within a HTML doc. The doc is a generated report and the text isn't within any HTML tags.. I need to find the text "test". I have tried the following code lines without any luck.. Below is a sample of the HTML doc. Also, if possible. I would like to then merge/move the name on the same line as "test" to the end of "NAME3" after "BILL". The names on the right are dynamic and change all the time. The left column are static and don't change So the final result would be;

<END RESULT>
<html>
<head>
</head>
<body>
<pre>
<font face="courier new" size=-4>                                                


test......... DOUG
NAME2........... HENRY
NAME3... BILL , DOUG
NAME4...... BOB

test......... ALLAN
NAME2........... MICHAEL
NAME3... MITCHELL, ALLAN
NAME4...... TOM

</pre>
</body>
</html>

<SAMPLE CODE>
<html>
<head>
</head>
<body>
<pre>
<font face="courier new" size=-4>                                                


test......... DOUG
NAME2........... HENRY
NAME3... BILL
NAME4...... BOB

test......... ALLAN
NAME2........... MICHAEL
NAME3... MITCHELL
NAME4...... TOM

</pre>
</body>
</html>



result = soup.find(text = "test")
result = soup.find(text = 'test')
result = soup.find_all(text = "test")
result = soup.find_all(text = 'test')

Upvotes: 1

Views: 103

Answers (1)

Jack Fleeting
Jack Fleeting

Reputation: 24930

If I understand you correctly, you are probably looking for something like this:

from bs4 import BeautifulSoup as bs
namepage = """[your sample code above, fixed - font wasn't closed]"""

soup = bs(namepage,'lxml')
result=soup.find('font')

names = result.text.strip()
newnames= ''

for name in names.splitlines():
   if "test" in name:        
       target= name.split('. ')[1]
   if "NAME3" in name:
       name += ", "+target
   newnames+='\n'+name

result.string.replace_with(' '.join([(elem+'\n') for elem in newnames.splitlines()]) )
soup

Output:

<html>
<head>
</head>
<body>
<pre>
<font face="courier new" size="-4">
 test......... DOUG
 NAME2........... HENRY
 NAME3... BILL, DOUG
 NAME4...... BOB
 
 test......... ALLAN
 NAME2........... MICHAEL
 NAME3... MITCHELL, ALLAN
 NAME4...... TOM
</font>
</pre>
</body>
</html>

Upvotes: 1

Related Questions