Reputation: 55
In bs4.BeautifulSoup
, there is bs4.element.Tag
object.
It have property text
and it have method get_text
.
Both are return same result about text(str).
And I got a curiosity about
" method vs property "
Which access is faster?
I check on my local by time.time()
, but in every run, results are changed.
Is this a useless curiosity?
Upvotes: 1
Views: 95
Reputation: 5207
The text
attribute can only give the text as it is.
Whereas get_text()
can do some 'customization'. Like inserting a separator in between the text of different tags or stripping the white spaces from the ends of the strings.
get_text()
accepts the following parameters:
separator
: Insert a string as separator between the texts of the individual tags.strip
: Strip the ends of the tags' texts of white spaces.Consider
html_str = """
<div>
\nHello
<span>World!</span>
<a href="">Click here</a>
</div>
"""
soup = BeautifulSoup(html_str, 'html.parser')
If we consider the <div>
tag's text like
soup.text
it would be
'\n\n\nHello\n World!\nClick here\n\n'
If strip
argument is used
>>> soup.get_text(strip=True)
'HelloWorld!Click here'
If separator
argument is used
>>> soup.get_text(separator='**')
'\n**\n\nHello\n **World!**\n**Click here**\n**\n'
If both separator
and strip
are used
>>> soup.get_text(separator='**', strip=True)
'Hello**World!**Click here'
Running time seems to be roughly the same.
%timeit soup.text
4.16 µs ± 56.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit soup.get_text(strip=True)
5.38 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit soup.get_text(separator='**')
4.16 µs ± 53.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit soup.get_text(separator='**', strip=True)
5.45 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Upvotes: 2