neulhan
neulhan

Reputation: 55

method vs property What is faster?

In bs4.BeautifulSoup, there is bs4.element.Tag object. It have property text and it have method get_text. Both are return same result about text(str).

And I got a curiosity about

" method vs property "

Which access is faster?

I check on my local by time.time(), but in every run, results are changed.

Is this a useless curiosity?

Upvotes: 1

Views: 95

Answers (1)

J...S
J...S

Reputation: 5207

The text attribute can only give the text as it is.

Whereas get_text() can do some 'customization'. Like inserting a separator in between the text of different tags or stripping the white spaces from the ends of the strings.


get_text() accepts the following parameters:

  • separator: Insert a string as separator between the texts of the individual tags.
  • strip: Strip the ends of the tags' texts of white spaces.

Consider

html_str = """
<div>
\nHello
  <span>World!</span>
  <a href="">Click here</a>
</div>
"""
soup = BeautifulSoup(html_str, 'html.parser')

If we consider the <div> tag's text like

soup.text

it would be

'\n\n\nHello\n  World!\nClick here\n\n'

If strip argument is used

>>> soup.get_text(strip=True)
'HelloWorld!Click here'

If separator argument is used

>>> soup.get_text(separator='**')
'\n**\n\nHello\n  **World!**\n**Click here**\n**\n'

If both separator and strip are used

>>> soup.get_text(separator='**', strip=True)
'Hello**World!**Click here'

Running time seems to be roughly the same.

%timeit soup.text
4.16 µs ± 56.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit soup.get_text(strip=True)
5.38 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit soup.get_text(separator='**')
4.16 µs ± 53.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit soup.get_text(separator='**', strip=True)
5.45 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Upvotes: 2

Related Questions