Reputation: 624
I have a simple Django search form set up in my web application, where users can search for particular words in my Arabic corpus. The users can search one of three way: 'Exact' (the word just as it's typed), 'Stem' (which brings up all inflected forms of the lemma entered), and 'RegEx' (where they can do more complex searches by regular expression).
The problem I'm having is, if the user submits an invalid regex, instead of giving them a validation error or empty results, it triggers a 500 server error. Which I imagine is confusing. Below is the traceback for such and error, caused by searching for a regex with unbalanced parenthesis: ha((.*(?!al))
Is there anyway to catch this kind of error, or make it more user-friendly? (I've also included the code for my form below.)
Thank you.
class ConcordanceForm(forms.Form):
searchterm = forms.CharField(max_length=100, required=True)
search_type = forms.ChoiceField(widget=RadioSelect(),
choices= ([('string', 'Exact'), ('lemma', 'Stem'), ('regex', 'Regex') ]),
required=True )
def concord_test(request):
if request.method == 'POST':
form = ConcordanceForm(request.POST)
if form.is_valid():
searchterm = form.cleaned_data['searchterm'].encode('utf-8')
search_type = form.cleaned_data['search_type']
context, texts_len, results_len = make_concordance(searchterm, search_type)
return render_to_response('corpus/concord.html', locals())
else:
form = ConcordanceForm()
return render_to_response('corpus/search_test.html',
{'form': form}, context_instance=RequestContext(request))
<p style=" font-weight:bold;">Search for any word in the corpus:</p>
<form action="/search_test/" method="post">{% csrf_token %}
{{ form.as_p }}
<input type="submit" value="Submit" />
</form>
Traceback (most recent call last):
File "/home/larapsodia/webapps/django/lib/python2.6/django/core/handlers/base.py", line 100, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/home/larapsodia/webapps/django/tunisiya2/corpus/views.py", line 154, in concord_test
context, texts_len, results_len = make_concordance(searchterm, search_type)
File "/home/larapsodia/webapps/django/tunisiya2/corpus/views.py", line 91, in make_concordance
p = re.compile(r'\b' + searchterm + r'__') # initial position in word_pos_lemma string
File "/usr/local/lib/python2.6/re.py", line 190, in compile
return _compile(pattern, flags)
File "/usr/local/lib/python2.6/re.py", line 245, in _compile
raise error, v # invalid expression
error: unbalanced parenthesis
<WSGIRequest
GET:<QueryDict: {}>,
POST:<QueryDict: {u'searchterm': [u'ha((.*(?!al))'], u'search_type': [u'regex'], u'csrfmiddlewaretoken': [u'c9a6cad4a0761580f5e351e9e534e028']}>,
COOKIES:{'__utma': '58037167.1544119768.1401037185.1401381302.1401384825.14',
'__utmb': '58037167.10.10.1401384825',
'__utmc': '58037167',
'__utmz': '58037167.1401037185.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',
'csrftoken': 'c9a6cad4a0761580f5e351e9e534e028',
'sessionid': '8d5b0b8730ccce0860b687b4c7ec1fdb'},
META:{'CONTENT_LENGTH': '109',
'CONTENT_TYPE': 'application/x-www-form-urlencoded',
'CSRF_COOKIE': 'c9a6cad4a0761580f5e351e9e534e028',
'DOCUMENT_ROOT': '/usr/local/apache2/htdocs',
'GATEWAY_INTERFACE': 'CGI/1.1',
'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'HTTP_ACCEPT_ENCODING': 'gzip,deflate,sdch',
'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.8,ar;q=0.6',
'HTTP_CACHE_CONTROL': 'max-age=0',
'HTTP_CONNECTION': 'close',
'HTTP_COOKIE': 'sessionid=8d5b0b8730ccce0860b687b4c7ec1fdb; csrftoken=c9a6cad4a0761580f5e351e9e534e028; __utma=58037167.1544119768.1401037185.1401381302.1401384825.14; __utmb=58037167.10.10.1401384825; __utmc=58037167; __utmz=58037167.1401037185.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',
'HTTP_FORWARDED_REQUEST_URI': '/search_test/',
'HTTP_HOST': 'www.tunisiya.org',
'HTTP_HTTPS': 'off',
'HTTP_HTTP_X_FORWARDED_PROTO': 'http',
'HTTP_ORIGIN': 'http://www.tunisiya.org',
'HTTP_REFERER': 'http://www.tunisiya.org/search_test/',
'HTTP_USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36',
'HTTP_X_FORWARDED_FOR': '68.9.41.110',
'HTTP_X_FORWARDED_HOST': 'www.tunisiya.org',
'HTTP_X_FORWARDED_PROTO': 'http',
'HTTP_X_FORWARDED_SERVER': 'www.tunisiya.org',
'HTTP_X_FORWARDED_SSL': 'off',
'PATH_INFO': u'/search_test/',
'PATH_TRANSLATED': '/home/larapsodia/webapps/django/tunisiya2.wsgi/search_test/',
'QUERY_STRING': '',
'REMOTE_ADDR': '127.0.0.1',
'REMOTE_PORT': '37086',
'REQUEST_METHOD': 'POST',
'REQUEST_URI': '/search_test/',
'SCRIPT_FILENAME': '/home/larapsodia/webapps/django/tunisiya2.wsgi',
'SCRIPT_NAME': u'',
'SERVER_ADDR': '127.0.0.1',
'SERVER_ADMIN': '[no address given]',
'SERVER_NAME': 'www.tunisiya.org',
'SERVER_PORT': '80',
'SERVER_PROTOCOL': 'HTTP/1.0',
'SERVER_SIGNATURE': '',
'SERVER_SOFTWARE': 'Apache/2.2.15 (Unix) mod_wsgi/3.2 Python/2.6.8',
'mod_wsgi.application_group': 'tunisiya2.com|',
'mod_wsgi.callable_object': 'application',
'mod_wsgi.handler_script': '',
'mod_wsgi.input_chunked': '0',
'mod_wsgi.listener_host': '',
'mod_wsgi.listener_port': '39877',
'mod_wsgi.process_group': '',
'mod_wsgi.request_handler': 'wsgi-script',
'mod_wsgi.script_reloading': '1',
'mod_wsgi.version': (3, 2),
'wsgi.errors': <mod_wsgi.Log object at 0xd69b570>,
'wsgi.file_wrapper': <built-in method file_wrapper of mod_wsgi.Adapter object at 0xa7efda0>,
'wsgi.input': <mod_wsgi.Input object at 0xd69b598>,
'wsgi.multiprocess': False,
'wsgi.multithread': True,
'wsgi.run_once': False,
'wsgi.url_scheme': 'http',
'wsgi.version': (1, 1)}>
Upvotes: 1
Views: 381
Reputation: 624
I ended up building a custom cleaner, as Antti mentioned. This is what worked in the end:
def clean(self):
cleaned_data = self.cleaned_data
searchterm = cleaned_data.get('searchterm')
search_type = cleaned_data.get('search_type')
if search_type == 'regex':
try:
re.search(searchterm, 'randomdatastring') #this is just to test if the regex is valid
except re.error:
raise forms.ValidationError("Invalid regular expression.")
return cleaned_data
Upvotes: 1
Reputation: 133919
Wrap the make_concordance
in try
-except
; if an exception occurs,
render the original form template for the user, along with the error information.
import re
try:
context, texts_len, results_len = make_concordance(searchterm, search_type)
except re.error as e:
form._errors['search_term'] = str(e)
del form.cleaned_data['search_term']
return render_to_response('corpus/search_test.html',
{'form': form}, context_instance=RequestContext(request))
The better way could be to make a custom cleaner, but it seems to be a bit more complicated, and I do not have Django.
Upvotes: 1
Reputation: 15170
To build on @Sam's comment, here's how to capture the specific error when a regular expression fails to compile:
import re
err_message = None
try:
re.compile('(unbalanced')
except re.error as exc:
err_message = 'Uhoh: {}'.format(exc)
print err_message
Output:
Uhoh: unbalanced parenthesis
Upvotes: 0