chhibbz
chhibbz

Reputation: 480

Python Beautiful Soup get part of text after the <br>

I am using Beautiful Soup for some scraping, and have got tags like these:

a =

<a class="list-group-item" href="URL Link">
    <span class="btn btn-blue "><span class="spanClass"></span></span>
    <strong>Store Name</strong>
    <br>Store Address Here      </a>

I just need the text after the <br> ... which is Store Address Here , while ignoring the Store Name

I tried a.text, but it gave me \n\nStore Name\nStore Address Here\t\t\t\t

I tried a.text.replace("\n",""), but it gave me Store NameStore Address Here\t\t\t\t

I tried a.find(text=True, recursive=False), but it gave me \n

Can someone guide me just to get the text after the <br>? TIA

Upvotes: 1

Views: 672

Answers (2)

GiovaniSalazar
GiovaniSalazar

Reputation: 2094

You can try some like this:

from bs4 import BeautifulSoup

html = """
<a class="list-group-item" href="URL Link">
    <span class="btn btn-blue "><span class="spanClass"></span></span>
    <strong>Store Name</strong>
    <br>Store Address Here      </a>
"""
soup = BeautifulSoup(html,'html.parser')    
for x in soup.find_all('br'):
  print(x.next_sibling)

result:

Store Address Here

Upvotes: 1

Branson Smith
Branson Smith

Reputation: 482

You could try:

address = a.text.split('\n')[-1].strip()

This will split the text up into a list of strings by separating it at every \n. Then the [-1] tells it to take the last string in that list. Lastly, strip() will remove leading and trailing white space, which will include \t (tabs) and \n (newlines).

Step by step (you can confirm this by printing the string at each step):

  1. Start with a.text -> '\n\nStore Name\nStore Address Here\t\t\t\t'
  2. a.text.split('\n') -> ['', 'Store Name', 'Store Address Here\t\t\t\t']
  3. a.text.split('\n')[-1] -> 'Store Address Here\t\t\t\t'
  4. a.text.split('\n')[-1].strip() -> 'Store Address Here'

Upvotes: 1

Related Questions