Reputation: 768
Context: I have a database of houses which users can query by location name (e.g. 'New York, NY' or '100 1st Street, San Francisco'). I'm using Google Maps API to retrieve pinpoints on a map of each of the query results, in addition to a list of each of the objects. Am using Django and Postgres as my framework and DB respectively.
Problem: I'm wondering how to filter the House
objects by their location, but not all location queries will contain the same information (i.e. some may have a house number, some may have a city or state, and some may not).
As shown by the code below, every House
object is linked via a OneToOneField to a Location
object which contains the necessary fields.
This is also complicated by the fact that every Location
object is made up of several fields, whereas the query will be a string that might not match a single field as you would use in Django's filter()
method. A query such as '100 1st Street, San Francisco' doesn't match any of the individual Location
fields since this query is made up of several fields. How might I write an algorithm of sorts to find any objects that match a given query as described?
Code:
models.py:
class House(models.Model):
...
mapped_location = models.OneToOneField(Location, related_name='location_house')
...
class Location(models.Model):
...
name = models.CharField(...)
street_name = models.CharField(...)
city = models.CharField(...)
views.py:
def show_results(request):
House.objects.filter( ??? )
return render(request, 'results.html', context)
Let me know if I need post anymore code, thanks!
Upvotes: 0
Views: 1200
Reputation: 1508
If you're giving the user free, unvalidated, natural language input opportunity, you need to be looking at a different set of tools, namely those geared towards natural language processing.
The first step would be token recognition, deciphering what pieces of the input string are street numbers, street names, cities, states, etc. This is similar to part-of-speech analysis, but with a different syntax.
If typos and inaccuracies are to be accounted for, the next step is 'fuzzy matching' aka using something like levenshtein edit distance or ngram analysis to find the 'best' matches within the 'canonical' values for each token type.
From there, you're in a position to query your db.
Refocusing in your actual question, how to mitigate an unknown combination of tokens, you have the option of validating the data via some kind of interface or form, or in taking the part-of-speech and nlp path described above.
Upvotes: 0
Reputation: 2170
The first step is to tokenize the search term, then you can build your query based on the various parts. Tokenization can be rather complex, so for best results you probably want to start with an existing library such as https://github.com/datamade/usaddress. Google search for "python address tokenization" for other options.
Next, you can use those parts in a query using Q objects:
from django.db.models import Q
House.objects.filter(
Q(mapped_location__name="100") |
Q(mapped_location__street_name="1st Street") |
Q(mapped_location__city="San Francisco")
)
(but replace the string constants above with the results from the tokenizer)
Upvotes: 2
Reputation: 702
You pass your query string by request GET with comma separated fields and check every field to your models fields individually and your show_results
will be like:
from django.db.models import Q
def show_results(request):
q = request.GET.get('query')
results = []
for query in unquote(q).split(','):
query = query.strip()
results += list(House.objects.filter(Q(mapped_location__name=query) | Q(mapped_location__street_name=query) | Q(mapped_location__city=query)))
context['results'] = results
return render(request, 'results.html', context)
And you get your results in template with variable results
Upvotes: 2