Reputation: 33
<br/>
generates a new line. However, If I were to use
replace to space OR strip(): the few address lines become one line.
How can I preserve that I still have a few address lines as shown in the expected output below?input from html:
<span class="c2">1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),<br />Karachi - 75640<br />Pakistan</span><br />
My code as follows:
if not (item.find('span', class_ = 'c2') is None):
address = item.find_all('span', class_ = 'c2')
for a in item.find_all('span', {"class":"c2"}):
for addr in address:
print('Before',addr)
if addr.find_all("br"):
for a in addr:
print('a',a)
if '<br/>' in a:
print('a loop',a)
My output for the class(c2) span as follows:
<span class="c2">1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),<br />Karachi - 75640<br />Pakistan</span><br />
Test Output result in the loop of the span as follows:
Before <span class="c2">1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),<br/>Karachi - 75640<br/>Pakistan</span>
a 1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),
a <br/>
a Karachi - 75640
a <br/>
a Pakistan
This causes my current undesirable output result:
1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),
Karachi - 75640
Pakistan
Expected output result:
1233/B, LAC II, St. 37/B, Mehmoodabad # 6,(Behind United Bakery),
Karachi - 75640
Pakistan
Upvotes: 0
Views: 64
Reputation: 84465
You can use stripped strings and join
from bs4 import BeautifulSoup as bs
html = '''
<span class="c2">1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),<br />Karachi - 75640<br />Pakistan</span><br />
'''
soup = bs(html, 'lxml')
for item in soup.select('.c2'):
strings = '\n'.join([string for string in item.stripped_strings])
print(strings)
Upvotes: 0
Reputation: 195603
You can use replace_with()
method of a tag object:
from bs4 import BeautifulSoup
data = '''<span class="c2">1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),<br />Karachi - 75640<br />Pakistan</span><br />'''
soup = BeautifulSoup(data, 'lxml')
for br in soup.select('br'):
br.replace_with('\n')
print(soup.text.strip())
Prints:
1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),
Karachi - 75640
Pakistan
Upvotes: 0