Reputation: 1163
I have currently worked abit with ORM using Peewee and I have been trying to understand how I am able to get the field url
from the table. The condition is that column visible
needs to be true as well. Meaning that if visible
is True and the store_id
is 4
then return all the url
as set.
I have currently done something like this
from peewee import (
Model,
TextField,
BooleanField
)
from playhouse.pool import PooledPostgresqlDatabase
# -------------------------------------------------------------------------
# Connection to Postgresql
# -------------------------------------------------------------------------
postgres_pool = PooledPostgresqlDatabase(
'xxxxxxx',
host='xxxxxxxx',
user='xxxxxxxx',
password='xxxxxx',
max_connections=20,
stale_timeout=30,
)
# ------------------------------------------------------------------------------- #
class Products(Model):
store_id = TextField(column_name='store_id')
url = TextField(column_name='url')
visible = BooleanField(column_name='visible')
class Meta:
database = postgres_pool
db_table = "develop"
@classmethod
def get_urls(cls):
try:
return set([i.url for i in cls.select().where((cls.store_id == 4) & (cls.visible))])
except Products.IntegrityError:
return None
However using the method takes around 0.13s which feels abit too long for me than what it supposed to do which I believe is due to the for loop and needing to put it as a set()
and I wonder if there is a possibility that peewee can do something like cls.select(cls.url).where((cls.store_id == 4) & (cls.visible)
and return as set()?
Upvotes: 0
Views: 220
Reputation: 26245
How many products do you have? How big is this set? Why not use distinct()
so that the database de-duplicates them for you? What indexes do you have? All of these questions are much more pertinent than "how do I make this python loop faster".
I'd suggest that you need an index on store_id, visible
or store_id where visible
.
create index "product_urls" on "products" ("store_id") where "visible"
You could even use a covering index but this may take up a lot of disk space:
create index "product_urls" on "products" ("store_id", "url") where visible
Once you've got the actual query sped up with an index, you can also use distinct()
to make the db de-dupe the URLs before sending them to Python. Additionally, since you only need the URL, just select that column and use the tuples()
method to avoid creating a class:
@classmethod
def get_urls(cls):
query = cls.select(cls.url).where((cls.store_id == 4) & cls.visible)
return set(url for url, in query.distinct().tuples())
Lastly please read the docs: http://docs.peewee-orm.com/en/latest/peewee/querying.html#iterating-over-large-result-sets
Upvotes: 1