John Wax
John Wax

Reputation: 13

How to make this python string-to-float function more efficient?

I created a small python script that changes a list (of any size in this case a) from a string of numbers as well as a string that has a number and string (million, billion, trillion) to a list of floats and prints it out.

Assume that the phrases 'million', 'billion', and 'trillion' are the only terms that can be used and they are always separated with a space from the number (if there is a number).

The code is below. Is there any way to make the script more concise and efficient?


a = ["10", "1000" , "1.684 million", "356852", "2.5 billion", "3 trillion"]

for i in range(len(a)):

  num_phrase=''

  if ' ' in a[i]:
    num_phrase=a[i].split(" ")[1]

  if num_phrase=="million":
    a[i]=float(a[i].split(" ")[0])*1000000
  elif num_phrase=="billion":
    a[i]=float(a[i].split(" ")[0])*1000000000
  elif num_phrase=="trillion":
    a[i]=float(a[i].split(" ")[0])*1000000000000
  else:
    a[i]=float(a[i].split(" ")[0])


print(list(a))

Upvotes: 0

Views: 198

Answers (4)

Dmitry Ermolov
Dmitry Ermolov

Reputation: 2237

Code in my previous answer is written close to the way I would solve problem in production, so it tries to be clear and make some additional checks to make sure invalid strings raise errors, that lowers performance

If we are trying to make fastest solution possible (and we don't care much about validating input) we can use following approach minimizing string modification at all.

a = ["10", "1000" , "1.684 million", "356852", "2.5 billion", "3 trillion"]
def convert(input):
    if " million" in input:
        return float(input[:-8]) * 1000000.0
    elif " billion" in input:
        return float(input[:-8]) * 1000000000.0
    elif " trillion" in input:
        return float(input[:-9]) * 1000000000000.0
    else:
        return float(input)
print([convert(e) for e in a])

Benchmark results (thanks to Kelly Bundy):

Round 1  Round 2  Round 3
4224 us  4164 us  4170 us  original
3129 us  3121 us  3180 us  Kelly1
3043 us  3176 us  3100 us  Kelly2
4381 us  4425 us  4345 us  dim_an
4053 us  4089 us  4119 us  motyzk
2160 us  2187 us  2169 us  dim_an2
  12 us    12 us    12 us  baseline

Try it

Upvotes: 0

Kelly Bundy
Kelly Bundy

Reputation: 27609

Could use a dict:

d = {'': 1, 'm': 1e6, 'b': 1e9, 't': 1e12}
a = [float(number) * d[unit[:1]]
     for s in a
     for number, _, unit in [s.partition(' ')]]

Or replace those illions with scientific notation:

a = [float(s.replace(' million', 'e6')
            .replace(' billion', 'e9')
            .replace(' trillion', 'e12'))
     for s in a]

Benchmark results with your list times 1000:

Round 1  Round 2  Round 3
3640 us  3618 us  3555 us  original
2747 us  2738 us  2706 us  Kelly1
2258 us  2272 us  2214 us  Kelly2
3759 us  3841 us  3802 us  dim_an
3495 us  3542 us  3562 us  motyzk

Benchmark code (Try it online!):

from timeit import timeit

def baseline(a):
    pass

def original(a):
 for i in range(len(a)):
  num_phrase=''
  if ' ' in a[i]:
    num_phrase=a[i].split(" ")[1]
  if num_phrase=="million":
    a[i]=float(a[i].split(" ")[0])*1000000
  elif num_phrase=="billion":
    a[i]=float(a[i].split(" ")[0])*1000000000
  elif num_phrase=="trillion":
    a[i]=float(a[i].split(" ")[0])*1000000000000
  else:
    a[i]=float(a[i].split(" ")[0])
 return a

def Kelly1(a):
    d = {'': 1, 'm': 1e6, 'b': 1e9, 't': 1e12}
    return [float(number) * d[unit[:1]]
            for s in a
            for number, _, unit in [s.partition(' ')]]

def Kelly2(a):
    return [float(s.replace(' million', 'e6')
                   .replace(' billion', 'e9')
                   .replace(' trillion', 'e12'))
            for s in a]

def dim_an(a):
 multipliers = {
    "million":  10 ** 6,
    "billion":  10 ** 9,
    "trillion": 10 ** 12,
 }
 for i in range(len(a)):
    words = a[i].split()
    if len(words) == 0 or len(words) > 2:
        raise ValueError("Bad string: " + e)

    result = float(words[0])
    if len(words) == 2:
        result *= multipliers[words[1]]
    a[i] = result
 return a

def motyzk(a):
 str_to_num = {
    "": 1,
    "million": 1000000,
    "billion": 1000000000,
    "trillion": 1000000000000,
 }
 for i in range(len(a)):
  num_phrase=''
  if ' ' in a[i]:
    num_phrase=a[i].split(" ")[1]
  a[i]=float(a[i].split(" ")[0])*str_to_num[num_phrase]
 return a

# config
funcs = original, Kelly1, Kelly2, dim_an, motyzk, baseline
a = ["10", "1000" , "1.684 million", "356852", "2.5 billion", "3 trillion"] * 1000
number = 100

# correctness
expect = original(a.copy())
for func in funcs:
    result = func(a.copy())
    print(result == expect, func.__name__)

# speed
tss = [[] for _ in funcs]
for _ in range(3):
    print('Round 1  Round 2  Round 3')
    for func, ts in zip(funcs, tss):
        t = timeit(lambda: func(a.copy()), number=number) / number
        ts.append(t)
        print(*('%4d us ' % (t * 1e6) for t in ts), func.__name__)
    print()

Upvotes: 2

motyzk
motyzk

Reputation: 384

a = ["10", "1000" , "1.684 million", "356852", "2.5 billion", "3 trillion"]

str_to_num = {
    "": 1,
    "million": 1000000,
    "billion": 1000000000,
    "trillion": 1000000000000,
}

for i in range(len(a)):
  
  num_phrase=''

  if ' ' in a[i]:
    num_phrase=a[i].split(" ")[1]

  a[i]=float(a[i].split(" ")[0])*str_to_num[num_phrase]



print(a)

Upvotes: 0

Dmitry Ermolov
Dmitry Ermolov

Reputation: 2237

If we talk about readability and performance I would change two things here:

  1. I wouldn't call str.split multiple times, it's costly.
  2. Maybe I would replace multiple ifs with dict It's not necessarily speedup when you have small number of branches, but it makes code more readable from my point of view and helps when you have more strings.
a = ["10", "1000" , "1.684 million", "356852", "2.5 billion", "3 trillion"]

multipliers = {
    "million":  10 ** 6,
    "billion":  10 ** 9,
    "trillion": 10 ** 12,
}

for i in range(len(a)):
    words = a[i].split()
    if len(words) == 0 or len(words) > 2:
        raise ValueError("Bad string: " + e)

    result = float(words[0])
    if len(words) == 2:
        result *= multipliers[words[1]]
    a[i] = result

print(a)

Upvotes: 0

Related Questions