Reputation: 13
I created a small python script that changes a list (of any size in this case a
) from a string of numbers as well as a string that has a number and string (million, billion, trillion) to a list of floats and prints it out.
Assume that the phrases 'million', 'billion', and 'trillion' are the only terms that can be used and they are always separated with a space from the number (if there is a number).
The code is below. Is there any way to make the script more concise and efficient?
a = ["10", "1000" , "1.684 million", "356852", "2.5 billion", "3 trillion"]
for i in range(len(a)):
num_phrase=''
if ' ' in a[i]:
num_phrase=a[i].split(" ")[1]
if num_phrase=="million":
a[i]=float(a[i].split(" ")[0])*1000000
elif num_phrase=="billion":
a[i]=float(a[i].split(" ")[0])*1000000000
elif num_phrase=="trillion":
a[i]=float(a[i].split(" ")[0])*1000000000000
else:
a[i]=float(a[i].split(" ")[0])
print(list(a))
Upvotes: 0
Views: 198
Reputation: 2237
Code in my previous answer is written close to the way I would solve problem in production, so it tries to be clear and make some additional checks to make sure invalid strings raise errors, that lowers performance
If we are trying to make fastest solution possible (and we don't care much about validating input) we can use following approach minimizing string modification at all.
a = ["10", "1000" , "1.684 million", "356852", "2.5 billion", "3 trillion"]
def convert(input):
if " million" in input:
return float(input[:-8]) * 1000000.0
elif " billion" in input:
return float(input[:-8]) * 1000000000.0
elif " trillion" in input:
return float(input[:-9]) * 1000000000000.0
else:
return float(input)
print([convert(e) for e in a])
Benchmark results (thanks to Kelly Bundy):
Round 1 Round 2 Round 3
4224 us 4164 us 4170 us original
3129 us 3121 us 3180 us Kelly1
3043 us 3176 us 3100 us Kelly2
4381 us 4425 us 4345 us dim_an
4053 us 4089 us 4119 us motyzk
2160 us 2187 us 2169 us dim_an2
12 us 12 us 12 us baseline
Upvotes: 0
Reputation: 27609
Could use a dict:
d = {'': 1, 'm': 1e6, 'b': 1e9, 't': 1e12}
a = [float(number) * d[unit[:1]]
for s in a
for number, _, unit in [s.partition(' ')]]
Or replace those illions with scientific notation:
a = [float(s.replace(' million', 'e6')
.replace(' billion', 'e9')
.replace(' trillion', 'e12'))
for s in a]
Benchmark results with your list times 1000:
Round 1 Round 2 Round 3
3640 us 3618 us 3555 us original
2747 us 2738 us 2706 us Kelly1
2258 us 2272 us 2214 us Kelly2
3759 us 3841 us 3802 us dim_an
3495 us 3542 us 3562 us motyzk
Benchmark code (Try it online!):
from timeit import timeit
def baseline(a):
pass
def original(a):
for i in range(len(a)):
num_phrase=''
if ' ' in a[i]:
num_phrase=a[i].split(" ")[1]
if num_phrase=="million":
a[i]=float(a[i].split(" ")[0])*1000000
elif num_phrase=="billion":
a[i]=float(a[i].split(" ")[0])*1000000000
elif num_phrase=="trillion":
a[i]=float(a[i].split(" ")[0])*1000000000000
else:
a[i]=float(a[i].split(" ")[0])
return a
def Kelly1(a):
d = {'': 1, 'm': 1e6, 'b': 1e9, 't': 1e12}
return [float(number) * d[unit[:1]]
for s in a
for number, _, unit in [s.partition(' ')]]
def Kelly2(a):
return [float(s.replace(' million', 'e6')
.replace(' billion', 'e9')
.replace(' trillion', 'e12'))
for s in a]
def dim_an(a):
multipliers = {
"million": 10 ** 6,
"billion": 10 ** 9,
"trillion": 10 ** 12,
}
for i in range(len(a)):
words = a[i].split()
if len(words) == 0 or len(words) > 2:
raise ValueError("Bad string: " + e)
result = float(words[0])
if len(words) == 2:
result *= multipliers[words[1]]
a[i] = result
return a
def motyzk(a):
str_to_num = {
"": 1,
"million": 1000000,
"billion": 1000000000,
"trillion": 1000000000000,
}
for i in range(len(a)):
num_phrase=''
if ' ' in a[i]:
num_phrase=a[i].split(" ")[1]
a[i]=float(a[i].split(" ")[0])*str_to_num[num_phrase]
return a
# config
funcs = original, Kelly1, Kelly2, dim_an, motyzk, baseline
a = ["10", "1000" , "1.684 million", "356852", "2.5 billion", "3 trillion"] * 1000
number = 100
# correctness
expect = original(a.copy())
for func in funcs:
result = func(a.copy())
print(result == expect, func.__name__)
# speed
tss = [[] for _ in funcs]
for _ in range(3):
print('Round 1 Round 2 Round 3')
for func, ts in zip(funcs, tss):
t = timeit(lambda: func(a.copy()), number=number) / number
ts.append(t)
print(*('%4d us ' % (t * 1e6) for t in ts), func.__name__)
print()
Upvotes: 2
Reputation: 384
a = ["10", "1000" , "1.684 million", "356852", "2.5 billion", "3 trillion"]
str_to_num = {
"": 1,
"million": 1000000,
"billion": 1000000000,
"trillion": 1000000000000,
}
for i in range(len(a)):
num_phrase=''
if ' ' in a[i]:
num_phrase=a[i].split(" ")[1]
a[i]=float(a[i].split(" ")[0])*str_to_num[num_phrase]
print(a)
Upvotes: 0
Reputation: 2237
If we talk about readability and performance I would change two things here:
str.split
multiple times, it's costly.if
s with dict It's not necessarily speedup when you have small number of branches, but it makes code more readable from my point of view and helps when you have more strings.a = ["10", "1000" , "1.684 million", "356852", "2.5 billion", "3 trillion"]
multipliers = {
"million": 10 ** 6,
"billion": 10 ** 9,
"trillion": 10 ** 12,
}
for i in range(len(a)):
words = a[i].split()
if len(words) == 0 or len(words) > 2:
raise ValueError("Bad string: " + e)
result = float(words[0])
if len(words) == 2:
result *= multipliers[words[1]]
a[i] = result
print(a)
Upvotes: 0