mdlee6
mdlee6

Reputation: 111

BeautifulSoup returns an empty string?

I don't know if this question has been asked before, but I couldn't find anything that could help solve my problem (hopefully I didn't misunderstand anything). I'm learning Python at the moment, using Python 3.5 with IPython, and I ran into some trouble using BeautifulSoup. As shown below,

import bs4
exampleFile = open('example.html')
exampleFile.read()
>>> '<html><head><title>The Website Title</title></head>\n<body>\n<p>Download my <strong>Python</strong> book from <a href=“http://inventwithpython.com”>my website</a>.</p>\n<p class=“slogan”>Learn Python the easy way!</p>\n<p>By <span id=“author”>Al Sweigart</span></p>\n</body></html>'
exampleSoup = bs4.BeautifulSoup(exampleFile.read(), 'html.parser')
exampleFile.read()
>>> ''
elems = exampleSoup.select('#author')
print(elems)
>>> []

I'm able to open and read example.html, but after I use BeautifulSoup, when I try to read the file again, it returns an empty string. I'm unable to define elems because of this.

I'm trying to understand why this is happening, but I couldn't figure it out so I decided to post a question.

Thanks in advance!

Upvotes: 1

Views: 2253

Answers (3)

mdlee6
mdlee6

Reputation: 111

It turns out that it was because of the weird quotes that were in the original example.html. I changed the font(?) of the quotes in another text editor, and it ended up working just fine. Thanks for all your help though. Really appreciate it!

Upvotes: 0

Kerry Hatcher
Kerry Hatcher

Reputation: 601

Danielu13 is correct. Here is what you want to do:

import bs4
exampleFile = open('example.html')
myHTML = exampleFile.read()
print(myHTML)
exampleSoup = bs4.BeautifulSoup(myHTML, 'html.parser')
print(exampleSoup)
elems = exampleSoup.select('#author')
print(elems)

The problem is when you call .read() on the file object, it 'empties' it to the screen. Then each .read() call on that file object from that point on is empty. In my example we save it to a string objecte named myHTML. Then we use myHTML from then on.

Note: the file object exampleFile isn't empty after you call .read(), its just that the reader is at the end of the file so there is nothing left to read. When I learned Python, the empty analogy is how someone explained it to me and it helped me understand it.

Upvotes: 0

Daniel Underwood
Daniel Underwood

Reputation: 2271

I believe your issue is having multiple calls to read(). You should use seek(0) to rewind to the beginning of the file before trying to read from it again. Here is a similar question.

Upvotes: 2

Related Questions