BeautifulSoup - extracting texts within one class

Question

I'm trying to extract texts from this webpage below:

Category1: 
Text1 I want > Category2: Text2 I want

I tried:

for div in soup.find_all('div', class_='MYCLASS'):
    for url in soup.find_all('a', id='category1'):
        print(url)

And it returned:

    Text1 I want

So I split the text...

    for div in soup.find_all('div', class_='MYCLASS'):
        for url in soup.find_all('a', id='category1'):
            category1 = str(url).split('category1">')[1].split('')[0]
            print(category1)

and extracted "Text1 I want", but still miss "Text2 I want". Any idea? Thank you.

EDIT:

There are other < a> < /a> in the source code, so if I remove id= from my code, it would return all other texts that I don't need. For examples,

RandomText.
RandomText.

RandomTextExtracted.

Also,

Keyur Potdar · Accepted Answer

Since the id of an element is unique, you can find the first tag using id="category1". To find the next tag, you can use find_next() method.

html = '''Category1: Text1 I want > Category2: Text2 I want'''
soup = BeautifulSoup(html, 'lxml')

a_tag1 = soup.find('a', id='category1')
print(a_tag1)    # or use `a_tag1.text` to get the text
a_tag2 = a_tag1.find_next('a')
print(a_tag2)

Output:

Text1 I want
Text2 I want

^{(I've tested it for the link you've provided, and it works there too.)}

BeautifulSoup - extracting texts within one class

Answers (2)

Related Questions