Reputation: 51
I want to scrape only word Andhra Pradesh ,I'm struggling with writing query for this, any help would be appreciated.
>>> container.findAll('b')
[<b><lable style="color:#3097b0;"> Aganampudi ( Public Funded ) </lable></b>, <b>NH-16 in Andhra Pradesh <br/> Stretch : </b>, <b>Tollable Length :</b>, <b>Fee Effective Date : </b>, <b> Due date of toll revision : </b>, <b style="color:Orange"> (With Discounting) </b>, <b> Rest Areas : </b>, <b>Truck Lay byes :</b>, <b>Static Weigh Bridge : </b>, <b> Helpline No. : </b>, <b>Emergency Services :</b>, <b>Nearest Police Station: </b>, <b>Highway Administrator (Project Director): </b>, <b>Project Implementation Unit(PIU)</b>, <b>Regional Office(RO)</b>, <b>Representative of Consultant</b>, <b>Representative of Concessionaire: </b>, <b>Nearest Hospital(s): </b>]
>>> search1 = container.findAll('b')
>>> search1[1]
<b>NH-16 in Andhra Pradesh <br/> Stretch : </b>
>>>
Upvotes: 1
Views: 735
Reputation: 838
You can extract it by using Python's string functions.
The cleaned string will look like this "NH-16 in Andhra Pradesh Stretch : "
I just look for the index of "in" which is 6 and "Stretch" which is 25 using .index()
, then I get the text from index 6 to 25 using text[onset + 2:offset]
- this is Python's version of substring. Let me know if you need clarifications.
text = search1[1].get_text()
onset = text.index('in')
offset = text.index('Stretch')
name = str(text[onset + 2:offset]).strip(' ')
print name
Upvotes: 1