Reputation: 1101
I am using django and I am trying to come up with a query that will allow me to do the following,
I have a column in the database called url
. The url
column has values that are very long. Basically the domain name followed by a long list of query parameters.
Eg:
https://www.somesite.com/something-interesting-digital-cos-or-make-bleh/?utm_source=something&utm_medium=email&utm_campaign=biswanyam%20report%20-%20digital%20cos%20or%20analog%20prey&ut
http://www.anothersite.com/holly-moly/?utm_source=something&utm_medium=email&tm_campaign=biswanyam%20report%20-%20digital%20cos%20or%20analog%20prey&ut
https://www.onemoresite.com/trinkle-star/?utm_source=something&utm_medium=email&utm_campaign=biswanyam%20report%20-%20digital%20cos%20or%20analog%20prey&ut
https://www.somesite.com/nothing-interesting-bleh/?utm_source=something&utm_medium=email&utm_campaign=biswanyam%20report%20-%20digital%20cos%20or%20analog%20prey&ut
I want a django query that can basically give me an annotated count of urls with the same domain name regardless of the query parameters in the URL.
So essentially this is what I am looking for,
{
'url': 'https://www.somesite.com/something-interesting-digital-cos-or-make-bleh', 'count': 127,
'url': 'http://www.anothersite.com/holly-moly', 'count': 87,
'url': 'https://www.onemoresite.com/trinkle-star', 'count': 94,
'url': 'https://www.somesite.com/nothing-interesting-bleh', 'count':72
}
I tried this query,
Somemodel.objects.filter(url__iregex='http.*\/\?').values('url').annotate(hcount=Count('url'))
This doesn't work as expected. It does an entire URL match along with the query parameters instead of matching only the domain name. Can someone please tell me how I might accomplish this or at least point me in the right direction. Thanks
Upvotes: 2
Views: 494
Reputation: 23512
This can be done using a combination of the database functions StrIndex
and Substr
to create a substring annotation
Here's an example illustrating how to annotate a User
query with a "domain field" based on getting the text after the "@" sign:
from django.db.models.functions import StrIndex, Substr
from django.db.models import Value
User.objects.annotate(domain=Substr("email", 1 + StrIndex("email", Value("@"))))
For a url, you would want to slice the text before the "?"...check the docs on Substr for details.
Upvotes: 0
Reputation: 25539
This might not be possible because you cannot group by some partial information on a certain field. If you really want to achieve this, you might want to consider changing your schema. You should store url
and parameters
separately, as 2 model fields. Then you would have a method or if you want to make it look like an attribute, use @property
decorator, to combine them and return the whole url. It wouldn't be too hard to split them in a migration/script to fit the new schema.
Upvotes: 0