Python Hunter
Python Hunter

Reputation: 27

Is there a way to do for loops faster

I want to be able to do an iteration of checking a condition of a value of a list that will only have numbers as entries. If it passes the conditional test, then I want to add it to a new list. Unfortunately I don't think I can do a list comprehension due to the fact not all values will be added to same list.

I want to be able to do this:

def sort(values: []):
    sum_0 = sum(values)
    len_0 = len(values)
    average_0 = sum_0 / len_0
    lesser_list_0 = []
    greater_list_0 = []
    for value in values:
        if value >= average_0:
            greater_list_0.append(value)
        else:
            lesser_list_0.append(value)

But without the annoyance of being slowed down by the for loop. Also, is there a faster way to add the value to the end of either list than using the append method?

Upvotes: 0

Views: 91

Answers (3)

tdelaney
tdelaney

Reputation: 77407

List comprehensions are loops too and all you really save is a lookup of greater_list_0.append or lesser_list_0.append on each round. By the time you create two lists, the for loop is faster. You can save a trivial amount of time by prestaging the two append methods you want. For the 3 scenarios shown below, timing on my machine is

for loop 1.0464496612548828
comprehensions 1.1907751560211182
less lookup 0.9023218154907227

And the test code is

import random
import time

def sort(values: []):
    sum_0 = sum(values)
    len_0 = len(values)
    average_0 = sum_0 / len_0
    greater_list_0 = []
    lesser_list_0 = []
    for value in values:
        if value >= average_0:
            greater_list_0.append(value)
        else:
            lesser_list_0.append(value)

def sort2(values: []):
    sum_0 = sum(values)
    len_0 = len(values)
    average_0 = sum_0 / len_0
    greater_list_0 = [val for val in values if val >= average_0]
    lesser_list_0 = [val for val in values if val < average_0]

def sort_less_lookup(values: []):
    sum_0 = sum(values)
    len_0 = len(values)
    average_0 = sum_0 / len_0
    greater_list_0 = []
    lesser_list_0 = []
    g_append = greater_list_0.append
    l_append = lesser_list_0.append
    for value in values:
        if value >= average_0:
            g_append(value)
        else:
            l_append(value)

values = list(range(100000))
random.shuffle(values)

tries = 100
start = time.time()
for _ in range(tries):
    sort(values)
delta = time.time() - start
print('for loop', delta)

start = time.time()
for _ in range(tries):
    sort2(values)
delta = time.time() - start
print('comprehensions', delta)

start = time.time()
for _ in range(tries):
    sort_less_lookup(values)
delta = time.time() - start
print('less lookup', delta)

Upvotes: 0

M.qaemi Qaemi
M.qaemi Qaemi

Reputation: 97

yes you can use pandas and numpy libraries for these operations. these libraries is optimized for these operations. they use c data types and concurrency and and multi processing and ... .

https://pandas.pydata.org/pandas-docs/stable/10min.html

you must use slicing and subsetting. it works like this but not exatly you must refer to docs: specific_value = values_mean my_datafram[my_dataframe['values'] >= specific_value]

you can calculate mean very efficient wiht this: https://www.geeksforgeeks.org/python-pandas-dataframe-mean/

Upvotes: 0

darcamo
darcamo

Reputation: 3503

Since you need to read all values to perform this computation, then you will need "some kind of loop". What you don't want to do is using a Python loop in numerical computations where you care for speed.

I suggest you have a look into some specialized library for numerical computation. Particularly, take a look into numpy. You have functions to easily compute the average and numpy has a very power indexing where you can index an array with a single value, with an array of integers, with an array of booleans, etc.

Check the code below, where we compare an array with a single scalar (the average) to get an array of booleans. Then we can use this array of booleans to only get the values in the original array where the corresponding booleans are True. This will give you exactly what you want.

import numpy as np


def separate_values(values: np.ndarray):
    average = np.mean(values)

    # This will gives an array of Boolean with the same dimension of `values`
    # and True only in places where the value is lower than the average
    mask1 = values < average
    mask2 = np.logical_not(mask1)  # We could also just write `values >= average`

    # We can use the boolean mask to index the original array.
    # This will gives us an array with the elements lower than the average
    lesser = values[mask1]
    # This will gives us an array with elements greater than or equal to the average
    greater = values[mask2]

    # Returns a tuple with both arrays
    return lesser, greater


if __name__ == '__main__':
    # A random array with 5 integers in the interval (0, 10]
    values = np.random.randint(0, 10, 5)

    lesser, greater = separate_values(values)

    print("Average:", np.mean(values))
    print("Values:", values)
    print("Values < average:", lesser)
    print("Values >= average:", greater)

You need to install numpy for this to work. It can be easily installed through pip, conda, etc..

Upvotes: 2

Related Questions