With BeautifulSoup, can I get texts with other strings between tags to seprate among those?

Question

So, I've working on crawling with BeautifulSoup, but I've encountered some messy html tags.

This is an example for that:


    
        Hey
        
            
                0817
            
        
        I want all of those
        
            
                But I want to get those separately
            
        
        Hope this work

So if I use code like this:

soup = BeautifulSoup(html,'html.parser')
body = soup.find("body")
print(body.text)

I'll probably get this:

"Hey0817I want all of thoseBut I want to get those separatelyHope this work"

The question is, can I get those texts with some strings as a separators? Separators to separate things between other tags Like:

"@@@Hey@@@0817@@@Iwant all of those@@@But I want to get those separately@@@Hope this work"
or
"Hey@@@0817@@@Iwant all of those@@@But I want to get those separately@@@Hope this work@@@"
or
"Hey@@@0817@@@Iwant all of those@@@But I want to get those separately@@@Hope this work"

So that I can sperate those texts by those "@@@" later with other codes? Or is there any walkaround doing similar things? Any advice would be greatly helpful. Thanks for your kind interest and times! Hope you can enlighten me.

Louic · Accepted Answer

If you want a list, you can use:

item_text = [t.text for t in body.find_all()]

if you really want the separators:

body.get_text('@@@')

With BeautifulSoup, can I get texts with other strings between tags to seprate among those?

Answers (2)

Related Questions