beautifulsoup to extract text in between two tags

Question

I have this HTML snippet:

Language
    French
    English
    Spanish

Music
    Rock
    Pop

And I want the output to be:

1 - Language/1 - French
1 - Language/2 - English
1 - Language/3 - Spanish
2 - Music/1 - Rock
2 - Music/2 -Pop

And here's my code

def get_genre_band(soup):
    genre = None
    for node in soup.findAll(['h3', 'a']):
        if node.name == 'h3':
            genre = node.text
        elif 'syllabus-item' in node.get('class', ''):
            yield genre.strip(), node.text.strip()

and I'm using it like this:

            for g, b in get_genre_band(section):
            print("{} 
	{}".format(g, b))

But I cannot get the proper numeration, I get something like this:

1 - Language/1 - French
1 - Language/2 - English
1 - Language/3 - Spanish
8 - Music/4 - Rock
9 - Music/5 -Pop

Keyur Potdar · Accepted Answer

You can use .next_sibling for this task.

Code:

for i, header in enumerate(soup.find_all('h3'), 1):
    next_tag = header
    j = 1
    while True:
        next_tag = next_tag.next_sibling
        if next_tag is None or next_tag.name == 'h3':
            break
        if next_tag.name is not None:
            print('{} - {}/{} - {}'.format(i, header.text, j, next_tag.string))
            j += 1

Output:

1 - Language/1 - French
1 - Language/2 - English
1 - Language/3 - Spanish
2 - Music/1 - Rock
2 - Music/2 - Pop

beautifulsoup to extract text in between two tags

Answers (1)

Related Questions