Prashant
Prashant

Reputation: 51

how to Scrape only specific words

I want to scrape only word Andhra Pradesh ,I'm struggling with writing query for this, any help would be appreciated.

>>> container.findAll('b')
[<b><lable style="color:#3097b0;"> Aganampudi ( Public Funded ) </lable></b>, <b>NH-16 in Andhra Pradesh <br/> Stretch : </b>, <b>Tollable Length :</b>, <b>Fee Effective Date : </b>, <b>  Due date of toll revision : </b>, <b style="color:Orange"> (With Discounting) </b>, <b> Rest Areas : </b>, <b>Truck Lay byes :</b>, <b>Static Weigh Bridge : </b>, <b> Helpline No. : </b>, <b>Emergency Services :</b>, <b>Nearest Police Station: </b>, <b>Highway Administrator (Project Director): </b>, <b>Project Implementation Unit(PIU)</b>, <b>Regional Office(RO)</b>, <b>Representative of Consultant</b>, <b>Representative of Concessionaire: </b>, <b>Nearest Hospital(s): </b>]
>>> search1 = container.findAll('b')
>>> search1[1]
<b>NH-16 in Andhra Pradesh <br/> Stretch : </b>
>>>

Upvotes: 1

Views: 735

Answers (1)

chad
chad

Reputation: 838

You can extract it by using Python's string functions.

The cleaned string will look like this "NH-16 in Andhra Pradesh Stretch : "

I just look for the index of "in" which is 6 and "Stretch" which is 25 using .index(), then I get the text from index 6 to 25 using text[onset + 2:offset] - this is Python's version of substring. Let me know if you need clarifications.

text = search1[1].get_text()
onset = text.index('in')
offset = text.index('Stretch')
name = str(text[onset + 2:offset]).strip(' ')
print name

Upvotes: 1

Related Questions