im7
im7

Reputation: 673

Storing values captured by Beautiful Soup in a dictionary and then accessing these values

I am learning Beautiful Soup and dictionaries in Python. I am following a short tutorial in Beautiful Soup by Stanford University to be found here: http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html

Since access to the webside was forbiden I have stored the text presented in the tutorial to a string and then converted the string soup to soup object. The printout is the following:

print(soup_string)

<html><body><div class="ec_statements"><div id="legalert_title"><a    
href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-Senators-
Urging-Them-to-Support-Cloture-and-Final-Passage-of-the-Paycheck-
Fairness-Act-S.2199">'Letter to Senators Urging Them to Support Cloture     
and Final Passage of the Paycheck Fairness Act (S.2199)
</a>
</div>
<div id="legalert_date">
September 10, 2014
</div>
</div>
<div class="ec_statements">
<div id="legalert_title">
<a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-  
Representatives-Urging-Them-to-Vote-on-the-Highway-Trust-Fund-Bill">
Letter to Representatives Urging Them to Vote on the Highway Trust Fund Bill
</a>
</div>
<div id="legalert_date">
        July 30, 2014
       </div>
</div>
<div class="ec_statements">
<div id="legalert_title">
<a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-Representatives-Urging-Them-to-Vote-No-on-the-Legislation-Providing-Supplemental-Appropriations-for-the-Fiscal-Year-Ending-Sept.-30-2014">
         Letter to Representatives Urging Them to Vote No on the Legislation Providing Supplemental Appropriations for the Fiscal Year Ending Sept. 30, 2014
        </a>
</div>
<div id="legalert_date">
        July 30, 2014
       </div>
</div>
<div class="ec_statements">
<div id="legalert_title">
<a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-Senators-Urging-Them-to-Vote-Yes-
             on-the-Motion-to-Proceed-to-the-Emergency-Supplemental-Appropriations-Act-of-2014-S.2648"></a></div></div></body></html>

At some point the tutor captures all the elements in a soup object that have Tag "div", class_="ec_statements".

   letters = soup_string.find_all("div", class_="ec_statements")

Then the tutor says:

"We'll go through all of the items in our letters collection, and for each one, pull out the name and make it a key in our dict. The value will be another dict, but we haven't yet found the contents for the other items yet so we'll just create assign an empty dict object."

The code is the following:

lobbying = {}
for element in letters:
    lobbying[element.a.get_text()] = {}

However when I print the keys and values of the lobbying dictionary I found that the last element -- "Letter-to-Senators-Urging-Them-to-Vote-Yes-on-the-Motion-to-Proceed-to-the-Emergency-Supplemental-Appropriations-Act-of-2014-S.2648" -- was missing. Instead, there was an empty dictionary with no key assigned to it.

for key, value in lobbying.iteritems():
    print key, value

{}

         Letter to Representatives Urging Them to Vote No on the Legislation Providing Supplemental Appropriations for the Fiscal Year Ending Sept. 30, 2014
         {}

         Letter to Representatives Urging Them to Vote on the Highway Trust Fund Bill
         {}
'Letter to Senators Urging Them to Support Cloture and Final Passage of the Paycheck Fairness Act (S.2199)
         {}

How do you explain this? Your advice will be appreciated.

Upvotes: 0

Views: 328

Answers (2)

Stergios
Stergios

Reputation: 3196

The element <a> of the last <div class="ec_statements"> does not have any text in it:

<a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-Senators-Urging-Them-to-Vote-Yes-
             on-the-Motion-to-Proceed-to-the-Emergency-Supplemental-Appropriations-Act-of-2014-S.2648">
</a>

Compare this to another div above:

<a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-  
Representatives-Urging-Them-to-Vote-on-the-Highway-Trust-Fund-Bill">
Letter to Representatives Urging Them to Vote on the Highway Trust Fund Bill
</a>

As you can see, the text in the 2nd example comes after the <a> tag and before the </a> tag. In the 1st example, there is no such text.

Upvotes: 1

solarc
solarc

Reputation: 5738

You are calling element.a.get_text() to generate the key, but the a tag for the last element has no text content: <a ...></a>

Upvotes: 0

Related Questions