CptSupermrkt
CptSupermrkt

Reputation: 7114

Effective data validation in Python

I use Python and sqlite3 to maintain a database, and before I insert anything, I need to perform validation checks on the data so that I know, even if the very forgiving database would accept crap input, that what's going in is actually not crap input.

To accomplish this, a class which represents a database record and whose fields match 1:1 with the columns will have methods like these in the same module:

def validate_path(path):
   if path == None:
      return False
   if len(path) < 5:
      return False
   if not re.search("^/[a-zA-Z0-9_/]+/$", path):
      return False
   else:
      return True

def validate_server(server):
   if server == None:
      return False
   if len(server) < 3:
      return False
   if not re.search("^[a-z][a-zA-Z0-9]+$", server):
      return False
   else:
      return True

def validate_name(name):
   if name == None:
      return False
   if len(name) < 3:
      return False
   if not re.search("^[a-z][a-z_]+$", name):
      return False
   else:
      return True

This is easy to understand while accomplishing my goals, but I feel that it's inefficient/too many if statements/a very "beginner" approach to the problem.

Then in my class, I'll have a method like:

   def validate(self):
      if validate_name(self.name) == False or \
         validate_server(self.home_file_server) == False or \
         validate_path(self.home_path) == False or \
         validate_server(self.web_file_server) == False or \
         validate_path(self.web_path) == False:
         return False
      else:
         return True

so that before running my INSERT command, I run:

if dbrecord.validate() == False:
   # do not insert
else:
   # insert

The individual validation methods are kept at the module level (not in the class), because I use the same methods when evaluating user input in a client script. So for example:

while True:
  home_path = raw_input("Home directory path: ")

  if AccountTypeManager.validate_path(home_path) == False:
     print "Invalid home directory path"
     logger.debug("User input home directory path {0} deemed invalid".format(home_path))
     continue

  logger.debug("User input home directory path {0} accepted as valid input.".format(home_path))
  break

I validate in both the method (in addition to the client script input) because the method could potentially be re-used elsewhere or in some other manner which doesn't take client input and without regard for where the data came from, it must be checked before going in.

My primary concern is maintainability and readability. I'd like to avoid using an external module for this, as it is very likely my successor will have no Python experience and start from scratch like I did. I want to accomplish the above in a "Pythonic" way which is also easily approachable from a non-Python background.

I've also heard that in Python, it's better to use Exceptions for validation, as opposed to True/False values. Is this true?

Upvotes: 1

Views: 2975

Answers (1)

C.B.
C.B.

Reputation: 8326

I see some redundancy in your code. Things that I think could help are:

Store patterns for matching in a dict. (Going this route you would also have to store the length)

Consolidate validate functions into one.

Consolidate the if statements into the pythonic all statement.

Consider using and to find one false in your validate function

E.g.

pattern_dict = {'server' : '^[a-z][a-zA-Z0-9]+$',...}

def aux_validate(my_input,input_type):
    return all([my_input is not None, re.search(pattern_dict[input_type],my_input),...])


def validate(self):
    validator = True
    for my_input,input_type in self.inputs:
        validator = validator and aux_validate(my_input,input_type)
    return validator

If you wanted to get a little fancier, you could consider using a list comprehension with validate and returning all, i.e.

return all([aux_validate(my_input,input_type) for my_input,input_type in self.inputs])

In the end, this might be considered opinionated, and you could be better served checking http://codereview.stackexchange.com

Upvotes: 2

Related Questions