PythonNewbie
PythonNewbie

Reputation: 1163

How to return field as set() using peewee

I have currently worked abit with ORM using Peewee and I have been trying to understand how I am able to get the field url from the table. The condition is that column visible needs to be true as well. Meaning that if visible is True and the store_id is 4 then return all the url as set.

I have currently done something like this

from peewee import (
    Model,
    TextField,
    BooleanField
)
from playhouse.pool import PooledPostgresqlDatabase

# -------------------------------------------------------------------------
# Connection to Postgresql
# -------------------------------------------------------------------------
postgres_pool = PooledPostgresqlDatabase(
    'xxxxxxx',
    host='xxxxxxxx',
    user='xxxxxxxx',
    password='xxxxxx',
    max_connections=20,
    stale_timeout=30,
)


# ------------------------------------------------------------------------------- #

class Products(Model):
    store_id = TextField(column_name='store_id')
    url = TextField(column_name='url')
    visible = BooleanField(column_name='visible')

    class Meta:
        database = postgres_pool
        db_table = "develop"

    @classmethod
    def get_urls(cls):
        try:
            return set([i.url for i in cls.select().where((cls.store_id == 4) & (cls.visible))])
        except Products.IntegrityError:
            return None

However using the method takes around 0.13s which feels abit too long for me than what it supposed to do which I believe is due to the for loop and needing to put it as a set() and I wonder if there is a possibility that peewee can do something like cls.select(cls.url).where((cls.store_id == 4) & (cls.visible) and return as set()?

Upvotes: 0

Views: 220

Answers (1)

coleifer
coleifer

Reputation: 26245

How many products do you have? How big is this set? Why not use distinct() so that the database de-duplicates them for you? What indexes do you have? All of these questions are much more pertinent than "how do I make this python loop faster".

I'd suggest that you need an index on store_id, visible or store_id where visible.

create index "product_urls" on "products" ("store_id") where "visible"

You could even use a covering index but this may take up a lot of disk space:

create index "product_urls" on "products" ("store_id", "url") where visible

Once you've got the actual query sped up with an index, you can also use distinct() to make the db de-dupe the URLs before sending them to Python. Additionally, since you only need the URL, just select that column and use the tuples() method to avoid creating a class:

@classmethod
def get_urls(cls):
    query = cls.select(cls.url).where((cls.store_id == 4) & cls.visible)
    return set(url for url, in query.distinct().tuples())
    

Lastly please read the docs: http://docs.peewee-orm.com/en/latest/peewee/querying.html#iterating-over-large-result-sets

Upvotes: 1

Related Questions