Renato Sanhueza
Renato Sanhueza

Reputation: 564

Python: Transform a unicode variable into a string variable

I used a web crawler to get some data. I stored the data in a variable price. The type of price is:

<class 'bs4.element.NavigableString'>

The type of each element of price is:

<type 'unicode'>

Basically the price contains some white space and line feeds followed by: $520. I want to eliminate all the extra symbols and recover only the number 520. I already did a naive solution:

def reducePrice(price):
    key=0
    string=""
        for i in price:
            if (key==1):
                string=string+i
            if (i== '$'):
                key=1
    key=0
    return string

But I want to implement a more elegant solution, transforming the type of price into str and then using str methods to manipulate it. I already searched a lot in the web and other posts in the forum. The best I could get was that using:

p = "".join(price)

I can generate a big unicode variable. If you can give me a hint I would be grateful (I'm using python 2.7 in Ubuntu).

edit I add my spider just in case you need it:

def spider(max_pages):
        page = 1
        while page <= max_pages:
            url = "http://www.lider.cl/walmart/catalog/product/productDetails.jsp?cId=CF_Nivel2_000021&productId=PROD_5913&skuId=5913&pId=CF_Nivel1_000004&navAction=jump&navCount=12"
            source_code = requests.get(url)
            plain_text = source_code.text
            soup = BeautifulSoup(plain_text)
            title = ""
            price = ""
            for link in soup.findAll('span', {'itemprop': 'name'}):
                title = link.string
            for link in soup.find('em', {'class': 'oferLowPrice fixPriceOferUp  '}):
                price = link.string

            print(title + '='+ str(reducePrice(price)))
            page += 1

spider(1)

edit 2 Thanks to Martin and mASOUD I could generate the solution using str methods:

def reducePrice(price):
   return int((("".join(("".join(price)).split())).replace("$","")).encode())

This method return an int. This was not my original question but it was the next step in my project. I added it because we can't cast unicode into int but using encode() to generate a str first, we can.

Upvotes: 1

Views: 471

Answers (1)

Martin Konecny
Martin Konecny

Reputation: 59671

Use a RegEx to extract the price from your Unicode string:

import re

def reducePrice(price):
    match = re.search(r'\d+', u'  $500  ')
    price = match.group()  # returns u"500"
    price = str(price) # convert "500" in unicode to single-byte characters.
    return price

Even though this function converts Unicode to a "regular" string as you asked, is there any reason you want this? Unicode strings can be worked with the same way as a regular string. That is u"500" is almost the same as "500"

Upvotes: 2

Related Questions