Sven
Sven

Reputation: 919

get random int from a set in python and how to improve performance

I actually use the following code to get a random int from a set. But this set is large so the random select is really slow. Is there any better way?

def getRandomBook():
    return int(random.sample(getBookSet(),1)[0])


def getBookSet(cleaned_sales_input = "data/cleaned_sales.csv"):
    with open(cleaned_sales_input, "rb") as sales_file:
        sales = csv.reader(sales_file)
        return {int(row[6]) for row in sales}

Upvotes: 0

Views: 301

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124110

Read the file only once, and turn the set into a list; the random.sample() implementation already turns a set into tuple just to be able to pick a sample. Avoid that overhead and just use random.choice() instead:

books = None

def getRandomBook():
    global books
    if books is None:
        books = list(getBookSet())
    return random.choice(books)

No need to call int() because you already converted the read values.

This at least speeds up picking a random value on repeat calls to getRandomBook(). If you need to call this only once per run of your program, there is no way around this other than creating a simpler file with just the unique book values.

Upvotes: 3

Related Questions