Reputation: 480
I am using Beautiful Soup for some scraping, and have got tags like these:
a =
<a class="list-group-item" href="URL Link">
<span class="btn btn-blue "><span class="spanClass"></span></span>
<strong>Store Name</strong>
<br>Store Address Here </a>
I just need the text after the <br>
... which is Store Address Here , while ignoring the Store Name
I tried a.text
, but it gave me \n\nStore Name\nStore Address Here\t\t\t\t
I tried a.text.replace("\n","")
, but it gave me Store NameStore Address Here\t\t\t\t
I tried a.find(text=True, recursive=False)
, but it gave me \n
Can someone guide me just to get the text after the <br>
? TIA
Upvotes: 1
Views: 672
Reputation: 2094
You can try some like this:
from bs4 import BeautifulSoup
html = """
<a class="list-group-item" href="URL Link">
<span class="btn btn-blue "><span class="spanClass"></span></span>
<strong>Store Name</strong>
<br>Store Address Here </a>
"""
soup = BeautifulSoup(html,'html.parser')
for x in soup.find_all('br'):
print(x.next_sibling)
result:
Store Address Here
Upvotes: 1
Reputation: 482
You could try:
address = a.text.split('\n')[-1].strip()
This will split the text up into a list of strings by separating it at every \n. Then the [-1] tells it to take the last string in that list. Lastly, strip() will remove leading and trailing white space, which will include \t (tabs) and \n (newlines).
Step by step (you can confirm this by printing the string at each step):
a.text
-> '\n\nStore Name\nStore Address Here\t\t\t\t'
a.text.split('\n')
-> ['', 'Store Name', 'Store Address Here\t\t\t\t']
a.text.split('\n')[-1]
-> 'Store Address Here\t\t\t\t'
a.text.split('\n')[-1].strip()
-> 'Store Address Here'
Upvotes: 1