Reputation: 187
Let's say I got an HTML tag below:
target = <tr src="./sound/6/4-1-1.mp3"><td class="code">(4-1)a.</td><td class="sound"><audio controls=""><source src="./sound/6/4-1-1.mp3" type="audio/mpeg"/></audio></td><td class="text"><p class="ab">Na mapaspas a Subalis bunuaz busul tu laas.</p><p class="en">Subali is going to hit the plum.</p></td></tr>
My ideal output:
<tr src="./sound/6/4-1-1.mp3">
I've tried by using the following code:
import re
from bs4 import BeautifulSoup
soup = BeautifulSoup(target, 'lxml')
soup.find(src=re.compile('\.\w'))
However, my output:
<source src="./sound/6/4-1-1.mp3" type="audio/mpeg"/>
How can I get the ideal output as mentioned above?
Thanks for any help!!
Upvotes: 1
Views: 219
Reputation: 24049
You can first find tr
then with regex
and '<tr.*>'
find what you want like below.
Try this:
from bs4 import BeautifulSoup
import re
html="""
<tr src="./sound/6/4-1-1.mp3">
<td class="code">(4-1)a.</td>
<td class="sound"><audio controls="">
<source src="./sound/6/4-1-1.mp3" type="audio/mpeg"/></audio>
</td>
<td class="text">
<p class="ab">Na mapaspas a Subalis bunuaz busul tu laas.</p>
<p class="en">Subali is going to hit the plum.</p>
</td>
</tr>
"""
soup=BeautifulSoup(html,"lxml")
re.search(r'<tr.*>',str(soup.find("tr"))).group()
Output:
'<tr src="./sound/6/4-1-1.mp3">'
Upvotes: 1