Reputation: 79
Here is the web CSS from which I want to extract the Location information.
<div class="location">
<div class="listing-location">Location</div>
<div class="location-areas">
<span class="location">Al Bayan</span>
,
<span class="location">Nepal</span>
</div>
<div class="area-description"> 3.3 km from Mall of the Emirates </div>
</div>
Python Beautuifulsoup4 Code I used is:
try:
title= soup.find('span',{'id':'listing-title-wrap'})
title_result= str(title.get_text().strip())
print "Title: ",title_result
except StandardError as e:
title_result="Error was {0}".format(e)
print title_result
Output:
"Al Bayanأ¢â‚¬آھ,أ¢â‚¬آھ
Nepal"
How can I convert the format into the following
['Al Bayan', 'Nepal']
What should be the line second of the code to get this output
Upvotes: 1
Views: 148
Reputation: 4341
You're reading it wrong, just the read the spans with class location
soup = BeautifulSoup(html, "html.parser")
locList = [loc.text for loc in soup.find_all("span", {"class" : "location"})]
print(locList)
This prints exactly what you wanted:
['Al Bayan', 'Nepal']
Upvotes: 1
Reputation: 2553
You can use regexp to filter only letter and spaces :
>>> import re
>>> re.findall('[A-Za-z ]+', area_result)
['Al Bayan', ' Nepal']
Hope it'll be helpful.
Upvotes: 0
Reputation: 16081
There is a one line solution. Consider a
as your string.
In [38]: [i.replace(" ","") for i in filter(None,(a.decode('unicode_escape').encode('ascii','ignore')).split('\n'))]
Out[38]: ['Al Bayan,', 'Nepal']
Upvotes: 0