BeautifulSoup - how to extract text without opening tag and before
tag?

Question

I'm new to python and beautifulsoup and spent quite a few hours trying to figure this one out.
I want to extract three particular text extracts within a

that has no class.
The first text extract I want is within an tag which is within an

tag. This I managed to extract it.
The second text extract immediately follows the closing h4 tag

and is followed by a

tag.
The third text extract immediately follows the

tag after the second text extract and is also followed by a

tag.

Here the html extract I work with:


    
    Decheterie de Bagnols
    
    Route des 4 Vents

    63810 Bagnols

I want to extract:

Decheterie de Bagnols < That works

Route des 4 Vents < Doesn't work

63810 Bagnols < Doesn't work

Here is the code I have so far:

import urllib
from bs4 import BeautifulSoup    
data = urllib.urlopen(url).read()
soup = BeautifulSoup(data, "html.parser")
name = soup.findAll("h4", class_="actorboxLink")

for a_tag in name:
    print a_tag.text.strip()

I need something like "soup.findAll(all text after )"

I played with using .next_sibling but I can't get it to work.

Any ideas? Thanks

UPDATE:
I tried this:

for a_tag in classActorboxLink:
    print a_tag.find_all_next(string=True, limit=5)

which gives me:
[u' ', u' Decheterie\xa0de\xa0Bagnols ', u' ', u' Route\xa0des\xa04\xa0Vents', u' 63810 Bagnols']

It's a start but I need to relove all the whitespaces and unecessary characters. I tried using .strip(),.strings and .stripped_strings but it doesn't work. Examples:

for a_tag in classActorboxLink.strings

for a_tag in classActorboxLink.stripped_strings

print a_tag.find_all_next(string=True, limit=5).strip()

For all three I get:

AttributeError: 'ResultSet' object has no attribute 'strings/stripped_strings/strip'

alecxe · Accepted Answer

Locate the h4 element and use find_next_siblings():

h4s = soup.find_all("h4", class_="actorboxLink")
for h4 in h4s:
    for text in h4.find_next_siblings(text=True):
        print(text.strip())

BeautifulSoup - how to extract text without opening tag and before <br> tag?

Answers (2)

Related Questions

BeautifulSoup - how to extract text without opening tag and before &lt;br&gt; tag?

Answers (2)

Related Questions

BeautifulSoup - how to extract text without opening tag and before <br> tag?