Reputation: 151
I want to extract the currency (along with the $ sign) from a list, and create two different currency lists which I have done. But is there a better way to code this?
The list is as below:
['\n\n\t\t\t\t\t$59.90\n\t\t\t\t\n\n\n\t\t\t\t\t\t$68.00\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$55.00\n\t\t\t\t\n\n\n\t\t\t\t\t\t$68.00\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$38.50\n\t\t\t\t\n\n\n\t\t\t\t\t\t$49.90\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$49.00\n\t\t\t\t\n\n\n\t\t\t\t\t\t$62.00\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$68.80\n\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$49.80\n\t\t\t\t\n\n\n\t\t\t\t\t\t$60.50\n\t\t\t\t\t\n\n']
Python code:
pp_list = []
up_list = []
for u in usual_price_list:
rep = u.replace("\n","")
rep = rep.replace("\t","")
s = rep.rsplit("$",1)
pp_list.append(s[0])
up_list.append("$"+s[1])
Upvotes: 0
Views: 89
Reputation: 6099
For this kind of problem, I tend to use a lot the re
module, as it is more readable, more maintainble and does not depend on which character surround what you are looking for :
import re
pp_list = []
up_list = []
for u in usual_price_list:
prices = re.findall(r"\$\d{2}\.\d{2}", u)
length_prices = len(prices)
if length_prices > 0:
pp_list.append(prices[0])
if length_prices > 1:
up_list.append(prices[1])
$
is the end of string character, so we need to escape it\d
matches any digit, so \d{2}
matches exactly 2 digits.
matches any character, so we need to escape itIf you want it you can modify the number of digits for the cents with \d{1,2}
for matches one or two digits, or \d*
to match 0 digit or more
Upvotes: 4
Reputation: 36440
As already pointed for doing that task re
module is useful - I would use re.split
following way:
import re
data = ['\n\n\t\t\t\t\t$59.90\n\t\t\t\t\n\n\n\t\t\t\t\t\t$68.00\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$55.00\n\t\t\t\t\n\n\n\t\t\t\t\t\t$68.00\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$38.50\n\t\t\t\t\n\n\n\t\t\t\t\t\t$49.90\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$49.00\n\t\t\t\t\n\n\n\t\t\t\t\t\t$62.00\n\t\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$68.80\n\t\t\t\t\n\n',
'\n\n\t\t\t\t\t$49.80\n\t\t\t\t\n\n\n\t\t\t\t\t\t$60.50\n\t\t\t\t\t\n\n']
prices = [re.split(r'[\n\t]+',i) for i in data]
prices0 = [i[1] for i in prices]
prices1 = [i[2] for i in prices]
print(prices0)
print(prices1)
Output:
['$59.90', '$55.00', '$38.50', '$49.00', '$68.80', '$49.80']
['$68.00', '$68.00', '$49.90', '$62.00', '', '$60.50']
Note that this will work assuming that there are solely \n
and \t
excluding prices and there is at least one \n
or \t
before first price and at least one \n
or \t
between prices.
[\n\t]+
denotes any string made from \n
or \t
with length 1 or greater, that is \n
, \t
, \n\n
, \t\t
, \n\t
, \t\n
and so on
Upvotes: 0