Reputation: 37
The following code is meant to look through tags in a webpage (the 'b', 'strong' and 'a' tags in specific 'li' entries). If the tag is found in a list (which can be found in the code) then the 'a class=vote-description__evidence' tag is added to another list - otherwise 0 is added to this list. The code can be found here:
import urllib2
from BeautifulSoup import *
def votedescget(link):
response = urllib2.urlopen(link)
html = response.read()
soup = BeautifulSoup(html)
desc = soup.findAll('ul',{'class':"vote-descriptions"})
readVotes = open("categories.txt","r")
#descList = []
#for line in readVotes.read().splitlines():
#descList.append(line)
resultsList = []
descList = ['<b>gay rights</b>', '<b>smoking bans</b>', '<b>hunting ban</b>', '<b>marriage</b>', '<b>equality and human rights</b>', '<b>assistance to end their life</b>', '<b>UK military forces</b>', '<b>Iraq war</b>', '<strong>investigations</strong>', '<b>Trident</b>', '<b>EU integration</b>', '<b>EU</b>', '<b>Military Covenant</b>', '<b>right to remain for EU nationals</b>', '<b>UK membership of the EU</b>', '<b>military action against <a href="https://en.wikipedia.org/wiki/Islamic_State_of_Iraq_and_the_Levant">ISIL (Daesh)</a></b>', '<b>housing benefit</b>', '<b>welfare benefits</b>', '<b>illness or disability</b>', '<b>council tax</b>', '<b>welfare benefits</b>', '<b>guaranteed jobs for young people</b>', '<b>income tax</b>', '<b>rate of VAT</b>', '<b>alcoholic drinks</b>', '<b>taxes on plane tickets</b>', '<b>fuel for motor vehicles</b>', '<b>income over £150,000</b>', '<b>occupational pensions</b>', '<b>occupational pensions</b>', '<b>banker’s bonus tax</b>', '<b>taxes on banks</b>', '<b>mansion tax</b>', '<b>rights for shares</b>', '<b>regulation of trade union activity</b>', '<b>capital gains tax</b>', '<b>corporation tax</b>', '<b>tax avoidance</b>', '<b>incentives for companies to invest</b>', '<b>high speed rail</b>', '<b>private patients</b>', '<b>NHS</b>', '<b>foundation hospitals</b>', '<b>smoking bans</b>', '<b>assistance to end their life</b>', '<b>autonomy for schools</b>', '<b>undergraduate tuition fee</b>', '<a href="https://en.wikipedia.org/wiki/Academy_(English_school)">academy schools</a>', '<b>financial support</b>', '<b>tuition fees</b>', '<b>funding of local government</b>', '<b>equal number of electors</b>', '<b>fewer MPs</b>', '<b>transparent Parliament</b>', '<a href="https://en.wikipedia.org/wiki/Proportional_representation">proportional system</a>', '<strong>wholly elected</strong>', '<b>taxes on business premises</b>', '<b>campaigning by third parties</b>', '<b>fixed periods between parliamentary elections</b>', '<b>hereditary peers</b>', '<b>more powers to the Welsh Assembly</b>', '<b>more powers to the Scottish Parliament</b>', '<b>powers for local councils</b>', '<b>over laws specifically impacting their part of the UK</b>', '<b>voting age</b>', '<b>stricter asylum system</b>', '<b>intervene in inquests</b>', '<b>ID cards</b>', '<b>Police and Crime Commissioners</b>', '<b>retention of information about communications</b>', '<b>enforcement of immigration rules</b>', '<b>mass surveillance</b>', '<b>merging police and fire services</b>', '<b>prevent climate change</b>', '<b>fuel for motor vehicles</b>', '<b>forests</b>', '<b>taxes on plane tickets</b>', '<b>electricity generation</b>', '<b>culling badgers</b>', '<b>hydraulic fracturing (fracking)</b>', '<b>high speed rail</b>', '<b>bus services</b>', '<b>rail fares</b>', '<b>fuel for motor vehicles</b>', '<b>taxes on plane tickets</b>', '<b>publicly owned railway system</b>', '<b>secure tenancies for life</b>', '<b>market rent to high earners renting a council home</b>', '<b>regulation of gambling</b>', '<b>civil service redundancy payments</b>', '<b title="Including voting to maintain them">anti-terrorism laws</b>', '<b>Royal Mail</b>', '<b>pub landlords rent-only leases</b>', '<b>legal aid</b>', '<b>courts in secret sessions</b>', '<b>register of lobbyists</b>', '<b>no-win no fee cases</b>', '<b>letting agents</b>', '<b><a href="http://webarchive.nationalarchives.gov.uk/20100527091800/http://programmeforgovernment.hmg.gov.uk/">Conservative - Liberal Democrat Coalition Agreement</a></b>']
#print descList
for line in desc:
li_list = line.findAll('li')
for li in li_list:
if len(li.findAll('b')) == 1:
if li.find('b') in descList:
resultsList.append(str(li.find('a',{'class':"vote-description__evidence"})))
print li.find('a',{'class':"vote-description__evidence"})
elif len(li.findAll('b')) == 2:
print li.findAll('b')[1]
if li.findAll('b')[1] in descList:
resultsList.append(str(li.find('a',{"class':'vote-description__evidence"})))
print li.find('a',{'class':"vote-description__evidence"})
elif li.find('strong') in descList:
resultsList.append(str(li.find('a',{"class':'vote-description__evidence"})))
print li.find('a',{'class':"vote-description__evidence"})
elif li.find('a') in descList:
resultsList.append(str(li.find('a',{"class':'vote-description__evidence"})))
print li.find('a',{'class':"vote-description__evidence"})
else:
resultsList.append('0')
print resultsList
votedescget("https://www.theyworkforyou.com/mp/10001/diane_abbott/hackney_north_and_stoke_newington/votes")
Usually the list is created programmatically from a file but for the sake of ease I've just included it as a variable. For some reason the result I'm getting when I run this code is as follows:
<b>assistance to end their life</b>
<b>council tax</b>
<b>assistance to end their life</b>
<b>over laws specifically impacting their part of the UK</b>
<b>electricity generation</b>
<b>no-win no fee cases</b>
<b>letting agents</b>
['0', '0', '0', '0']
Could anyone tell me why this is happening, or how to fix it? What I'm expecting is a list of zeroes interspersed with results where the tags are found in descList, but this isn't what's happening.
Upvotes: 0
Views: 525
Reputation: 1539
In your comparison you are checking if li.find('b') in descList:
Have you tested whether or not a navigable string can be compared to a string in this way? Beautiful soup returns a navigable string rather than a string, which is why you are type casting it to a string before you append it to your list; however, you are not type casting it before this comparison.
Upvotes: 1