sinG20
sinG20

Reputation: 151

How to better code, when looking for substrings?

I want to extract the currency (along with the $ sign) from a list, and create two different currency lists which I have done. But is there a better way to code this?

The list is as below:

['\n\n\t\t\t\t\t$59.90\n\t\t\t\t\n\n\n\t\t\t\t\t\t$68.00\n\t\t\t\t\t\n\n',
 '\n\n\t\t\t\t\t$55.00\n\t\t\t\t\n\n\n\t\t\t\t\t\t$68.00\n\t\t\t\t\t\n\n',
 '\n\n\t\t\t\t\t$38.50\n\t\t\t\t\n\n\n\t\t\t\t\t\t$49.90\n\t\t\t\t\t\n\n',
 '\n\n\t\t\t\t\t$49.00\n\t\t\t\t\n\n\n\t\t\t\t\t\t$62.00\n\t\t\t\t\t\n\n',
 '\n\n\t\t\t\t\t$68.80\n\t\t\t\t\n\n',
 '\n\n\t\t\t\t\t$49.80\n\t\t\t\t\n\n\n\t\t\t\t\t\t$60.50\n\t\t\t\t\t\n\n']

Python code:

pp_list = []
up_list = []

for u in usual_price_list:
    rep = u.replace("\n","")
    rep = rep.replace("\t","")
    s = rep.rsplit("$",1)
    pp_list.append(s[0])
    up_list.append("$"+s[1])

Upvotes: 0

Views: 89

Answers (2)

BlueSheepToken
BlueSheepToken

Reputation: 6099

For this kind of problem, I tend to use a lot the re module, as it is more readable, more maintainble and does not depend on which character surround what you are looking for :

import re

pp_list = []
up_list = []


for u in usual_price_list:
    prices = re.findall(r"\$\d{2}\.\d{2}", u)
    length_prices = len(prices)
    if length_prices > 0:
        pp_list.append(prices[0])
    if length_prices > 1:
        up_list.append(prices[1])

Regular Expresion Breakdown

  • $ is the end of string character, so we need to escape it
  • \d matches any digit, so \d{2} matches exactly 2 digits
  • . matches any character, so we need to escape it

If you want it you can modify the number of digits for the cents with \d{1,2} for matches one or two digits, or \d* to match 0 digit or more

Upvotes: 4

Daweo
Daweo

Reputation: 36440

As already pointed for doing that task re module is useful - I would use re.split following way:

import re
data = ['\n\n\t\t\t\t\t$59.90\n\t\t\t\t\n\n\n\t\t\t\t\t\t$68.00\n\t\t\t\t\t\n\n',
 '\n\n\t\t\t\t\t$55.00\n\t\t\t\t\n\n\n\t\t\t\t\t\t$68.00\n\t\t\t\t\t\n\n',
 '\n\n\t\t\t\t\t$38.50\n\t\t\t\t\n\n\n\t\t\t\t\t\t$49.90\n\t\t\t\t\t\n\n',
 '\n\n\t\t\t\t\t$49.00\n\t\t\t\t\n\n\n\t\t\t\t\t\t$62.00\n\t\t\t\t\t\n\n',
 '\n\n\t\t\t\t\t$68.80\n\t\t\t\t\n\n',
 '\n\n\t\t\t\t\t$49.80\n\t\t\t\t\n\n\n\t\t\t\t\t\t$60.50\n\t\t\t\t\t\n\n']
prices = [re.split(r'[\n\t]+',i) for i in data]
prices0 = [i[1] for i in prices]
prices1 = [i[2] for i in prices]
print(prices0)
print(prices1)

Output:

['$59.90', '$55.00', '$38.50', '$49.00', '$68.80', '$49.80']
['$68.00', '$68.00', '$49.90', '$62.00', '', '$60.50']

Note that this will work assuming that there are solely \n and \t excluding prices and there is at least one \n or \t before first price and at least one \n or \t between prices.

[\n\t]+ denotes any string made from \n or \t with length 1 or greater, that is \n, \t, \n\n, \t\t, \n\t, \t\n and so on

Upvotes: 0

Related Questions