Reputation: 6958
I'm logging all user search queries in a model like this:
class SearchLog(models.Model):
query = models.CharField(max_length=512)
datetime = models.DateTimeField(auto_now_add=True, db_index=True)
To get all queries which has at most one word I make this queryset:
SearchLog.objects.exclude(query__contains=" ")
I want to get queries which has at most two words. is there anyway even with raw sql?
Upvotes: 2
Views: 1989
Reputation: 476659
One can use a regular expression (regex) for this. This is a textual pattern you describe.
For example to match at most two words, a regex could look like:
^\S+(\s+\S+)?$
(but depending on the situation, you might have to alter it a bit).
The \S
stands for non-space characters (i.e. no space, tab, new line, etc.). We repeat such characters one or more time (with the +
quantifier). Next we allow a second word optionally (that is the meaning of the question mark ?
at the end). This new words consists out of one or more consecutive spacing chars (with \s+
) and one or more non-space characters (with \S+
). The caret (^
) and dollar ($
) anchors denote the begin and end of the string (without it, it would match anything that has at least one word). As said before one of the problems might be what you see as a word, so based on that specifications, you might have to change the regex a bit.
In case for example queries with no words at all should be matched as well, we have to change it to ^(\S+(\s+\S+)?)?$
but then strings with only spacing are still not matched. You see it can be difficult to get the pattern completely right, since it basically depends on what you see as a "match" and what not.
You can test the regex with regex101. The strings that match are lines that are highlighted. The lines with three or more words are not highlighted, hence a regex would exclude those. You can use this tool to test the regex, and change it, until it perfectly matches your requirements.
So we can filter with:
SearchLog.objects.filter(query__regex=r'^\S+(\s+\S+)?$')
Regexes are capable to perform rather advanced matching. In computer science there is however the famous "pumping lemma for regular languages" that specifies that there are certain families of patterns that can not be written as regular expressions (well in fact there are families of patterns that can not be matched by any program at all). Here that does not matter much (I think), but a regex is thus not per se capable to match any pattern a programmer has in mind.
Upvotes: 3