Python - Extracting data from this Html tag using BS4, instead of getting None

Question

This is my code:

html = '''
Data I want to extract
'''


soup = BeautifulSoup(html, 'html.parser')

print(soup.select_one('td').string)

It returns None. I think it has to do with that span tag which is empty. I think it goes into that span tag, and returns those contents? So I either want to delete that span tag, or stop as soon as it finds the 'Data I want to extract', or tell it to ignore empty tags

If there are no empty tags inside 'td' it actually works.

Is there a way to ignore empty tags in general and go one step back? Instead of ignoring this specific span tag?

Sorry if this is too elementary, but I spent a fair amount of time searching.

Andrej Kesely · Accepted Answer

Use .text property, not .string:

html = '''
Data I want to extract
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')

print(soup.select_one('td').text)

Output:

Data I want to extract

Python - Extracting data from this Html tag using BS4, instead of getting None

Answers (2)

Related Questions