Panetta
Panetta

Reputation: 79

Removing string extra characters via python string functions

Here is the web CSS from which I want to extract the Location information.

<div class="location">
    <div class="listing-location">Location</div>
    <div class="location-areas">
    <span class="location">Al Bayan</span>
    ‪,‪
    <span class="location">Nepal</span>
    </div>
    <div class="area-description"> 3.3 km from Mall of the Emirates </div>
    </div>

Python Beautuifulsoup4 Code I used is:

   try:
            title= soup.find('span',{'id':'listing-title-wrap'})
            title_result= str(title.get_text().strip())
            print "Title: ",title_result
    except StandardError as e:
            title_result="Error was {0}".format(e)
            print title_result

Output:

"Al Bayanأ¢â‚¬آھ,أ¢â‚¬آھ

                            Nepal"

How can I convert the format into the following

['Al Bayan', 'Nepal']

What should be the line second of the code to get this output

Upvotes: 1

Views: 148

Answers (3)

Keatinge
Keatinge

Reputation: 4341

You're reading it wrong, just the read the spans with class location

soup = BeautifulSoup(html, "html.parser")
locList = [loc.text for loc in soup.find_all("span", {"class" : "location"})]
print(locList)

This prints exactly what you wanted:

['Al Bayan', 'Nepal']

Upvotes: 1

3kt
3kt

Reputation: 2553

You can use regexp to filter only letter and spaces :

>>> import re
>>> re.findall('[A-Za-z ]+', area_result)
['Al Bayan', ' Nepal']

Hope it'll be helpful.

Upvotes: 0

Rahul K P
Rahul K P

Reputation: 16081

There is a one line solution. Consider a as your string.

In [38]: [i.replace("  ","") for i in filter(None,(a.decode('unicode_escape').encode('ascii','ignore')).split('\n'))]
Out[38]: ['Al Bayan,', 'Nepal']

Upvotes: 0

Related Questions