jayt
jayt

Reputation: 768

Django: Querying database with custom search algorithm

Context: I have a database of houses which users can query by location name (e.g. 'New York, NY' or '100 1st Street, San Francisco'). I'm using Google Maps API to retrieve pinpoints on a map of each of the query results, in addition to a list of each of the objects. Am using Django and Postgres as my framework and DB respectively.

Problem: I'm wondering how to filter the House objects by their location, but not all location queries will contain the same information (i.e. some may have a house number, some may have a city or state, and some may not).

As shown by the code below, every House object is linked via a OneToOneField to a Location object which contains the necessary fields.

This is also complicated by the fact that every Location object is made up of several fields, whereas the query will be a string that might not match a single field as you would use in Django's filter() method. A query such as '100 1st Street, San Francisco' doesn't match any of the individual Location fields since this query is made up of several fields. How might I write an algorithm of sorts to find any objects that match a given query as described?

Code:

models.py:

class House(models.Model):
    ...
    mapped_location = models.OneToOneField(Location, related_name='location_house')
    ...

class Location(models.Model):
    ...
    name = models.CharField(...)
    street_name = models.CharField(...)
    city = models.CharField(...)

views.py:

def show_results(request):
    House.objects.filter( ??? )
    return render(request, 'results.html', context)

Let me know if I need post anymore code, thanks!

Upvotes: 0

Views: 1200

Answers (3)

John R
John R

Reputation: 1508

If you're giving the user free, unvalidated, natural language input opportunity, you need to be looking at a different set of tools, namely those geared towards natural language processing.

The first step would be token recognition, deciphering what pieces of the input string are street numbers, street names, cities, states, etc. This is similar to part-of-speech analysis, but with a different syntax.

If typos and inaccuracies are to be accounted for, the next step is 'fuzzy matching' aka using something like levenshtein edit distance or ngram analysis to find the 'best' matches within the 'canonical' values for each token type.

From there, you're in a position to query your db.

Refocusing in your actual question, how to mitigate an unknown combination of tokens, you have the option of validating the data via some kind of interface or form, or in taking the part-of-speech and nlp path described above.

Upvotes: 0

Pwnosaurus
Pwnosaurus

Reputation: 2170

The first step is to tokenize the search term, then you can build your query based on the various parts. Tokenization can be rather complex, so for best results you probably want to start with an existing library such as https://github.com/datamade/usaddress. Google search for "python address tokenization" for other options.

Next, you can use those parts in a query using Q objects:

from django.db.models import Q

House.objects.filter(
    Q(mapped_location__name="100") | 
    Q(mapped_location__street_name="1st Street") | 
    Q(mapped_location__city="San Francisco")
)

(but replace the string constants above with the results from the tokenizer)

Upvotes: 2

Bakhrom Rakhmonov
Bakhrom Rakhmonov

Reputation: 702

You pass your query string by request GET with comma separated fields and check every field to your models fields individually and your show_results will be like:

from django.db.models import Q

def show_results(request):
    q = request.GET.get('query')
    results = []
    for query in unquote(q).split(','):
        query = query.strip()
        results += list(House.objects.filter(Q(mapped_location__name=query) | Q(mapped_location__street_name=query) | Q(mapped_location__city=query)))
    context['results'] = results
    return render(request, 'results.html', context)

And you get your results in template with variable results

Upvotes: 2

Related Questions