Kartik Sibal
Kartik Sibal

Reputation: 51

Different search results in different environments

I am learning Data science and while doing a problem, I came across a weird observation. The problem was to print the number of occurrences of the string 'Soup' on the Beautiful Soup home page, using python. The weird part is, the number of occurrences varies in the iPython notebook and in Python and when I ran a manual search on the webpage the result was entirely different.

I'd love if someone could give a plausible explanation. I have attached along, the code snippets and the results:

In Python

I have simply used urllib and not BeautifulSoup

In Pandas

Using the .count() function

Manually

enter image description here

As you can see the result varies in all the environments, it shows 39 occurrences in Python, 41 in Pandas and 35 via manual search.

Thanks

Upvotes: 1

Views: 55

Answers (1)

jezrael
jezrael

Reputation: 863481

I think Python found only 39, because 2 missing are in <head>:

<title>Beautiful Soup: We called him Tortoise because he taught us.</title>
<meta name="Description" content="Beautiful Soup: a library designed for screen-scraping HTML and XML.">

You can check it by Source of the page - there are 41 occurrences.

If check webpage manually (35 occurences), 4 are in urls and 2 in <head>, so together 41:

<a href="http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html">Here's
the Beautiful Soup 3 documentation.</a>
<a href="download/3.x/BeautifulSoup-3.2.1.tar.gz">3.2.1</a> 
<a href="/source/software/BeautifulSoup/index.bhtml">
<a href="http://www.crummy.com/software/BeautifulSoup/">

Upvotes: 3

Related Questions