eamon1234
eamon1234

Reputation: 1585

Python Memory Issue with BeautifulSoup

I've resolved this issue, but I'm wondering why it was caused in the first place. I used BeautifulSoup to identify this span from a webpage:

span = <span id="ctl00_ContentPlaceHolder1_RestInfoReskin_lblRestName">Ally's Sizzlers</span>

I then assign this variable:

restaurant.name = span.contents

However on each loop this takes up a full 1 MB, and there's about 20,000 loops. Through trial and error I came upon this solution:

restaurant.name = str(span.contents)

Can you tell me why the former span.contents takes up so much memory?

Upvotes: 1

Views: 1215

Answers (2)

Tey&#39;
Tey&#39;

Reputation: 990

Old stuff, but just in case other people wonder: span.contents returns a reference to a NavigableString instance. There is a link between this instance and the DOM tree, so that as long as this instance is in use, the whole DOM tree cannot be released from memory by the garbage collector. Thus, as long as restaurant.name is not released from memory, the whole DOM tree is kept in memory.

Using str(span.contents) returns a string which is not linked with the DOM tree, so it does not prevent the DOM tree from being released from memory.

Upvotes: 1

scripts
scripts

Reputation: 1470

Probably because str(span.contents) is calling the __str__ function inside the object span.contents and returning a smaller representation. You can use the pympler to measure the memory consumption

Upvotes: 2

Related Questions