Sam Atcheson
Sam Atcheson

Reputation: 95

How does Python provide a maintainable way to pass data structures around in a system?

I am new to dynamic languages in general, and I have discovered that languages like Python prefer simple data structures, like dictionaries, for sending data between parts of a system (across functions, modules, etc).

In the C# world, when two parts of a system communicate, the developer defines a class (possibly one that implements an interface) that contains properties (like a Person class with a Name, Birth date, etc) where the sender in the system instantiates the class and assigns values to the properties. The receiver then accesses these properties. The class is called a DTO and it is "well- defined" and explicit. If I remove a property from the DTO's class, the compiler will instantly warn me of all parts of the code that use that DTO and are attempting to access what is now a non-existent property. I know exactly what has broken in my codebase.

In Python, functions that produce data (senders) create implicit DTOs by building up dictionaries and returning them. Coming from a compiled world, this scares me. I immediately think of the scenario of a large code base where a function producing a dictionary has the name of a key changed (or a key is removed altogether) and boom- tons of potential KeyErrors begin to crop up as pieces of the code base that work with that dictionary and expect a key are no longer able to access the data they were expecting. Without unit testing, the developer would have no reliable way of knowing where these errors will appear.

Maybe I misunderstand altogether. Are dictionaries a best practice tool for passing data around? If so, how do developers solve this kind of problem? How are implicit data structures and the functions that use them maintained? How do I become less afraid of what seems like a huge amount of uncertainty?

Upvotes: 6

Views: 2076

Answers (3)

6502
6502

Reputation: 114579

When using Python for a large project using automated testing is a must because otherwise you would never dare to do any serious refactoring and the code base will rot in no time as all your changes will always try to touch nothing leading to bad solution (simply because you'd be too scared to implement the correct solution instead).

Indeed the above is true even with C++ or, as it often happens with large projects, with mixed-languages solutions.

Not longer than a few HOURS ago for example I had to make a branch for a four line bugfix (one of the lines is a single brace) for a specific customer because the trunk evolved too much from the version he has in production and the guy in charge of the release process told me his use cases have not yet been covered with manual testing in current version and therefore I cannot upgrade his installation.

The compiler can tell you something, but it cannot provide any confidence in stability after a refactoring if the software is complex. The idea that if some piece of code compiles then it's correct is bogus (possibly with the exception of hello_world.cpp).

That said you normally don't use dictionaries in Python for everything, unless you really care about the dynamic aspect (but in this case the code doesn't access the dictionary with a literal key). If your Python code has a lot of d["foo"] instead of d[k] when using dicts then I'd say there is a smell of a design problem.

Upvotes: 2

Burhan Khalid
Burhan Khalid

Reputation: 174682

Coming from a compiled world, this scares me. I immediately think of the scenario of a large code base where a function producing a dictionary has the name of a key changed (or a key is removed altogether) and boom- tons of potential KeyErrors begin to crop up as pieces of the code base that work with that dictionary and expect a key are no longer able to access the data they were expecting.

I would just like to highlight this part of your question, because I feel this is the main point you are trying to understand.

Python's development philosophy is a bit different; as objects can mutate without throwing errors (for example, you can add properties to instances without having them declared in the class) a common programming practice in Python is EAFP:

EAFP

Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many try and except statements. The technique contrasts with the LBYL style common to many other languages such as C.

The LBYL referred to from the quote above is "Look Before You Leap":

LBYL

Look before you leap. This coding style explicitly tests for pre-conditions before making calls or lookups. This style contrasts with the EAFP approach and is characterized by the presence of many if statements.

In a multi-threaded environment, the LBYL approach can risk introducing a race condition between “the looking” and “the leaping”. For example, the code, if key in mapping: return mapping[key] can fail if another thread removes key from mapping after the test, but before the lookup. This issue can be solved with locks or by using the EAFP approach.

So I would say this is a bit of the norm and in Python you expect the objects will behave well and handle themselves with grace (mainly by throwing up lots of exceptions). Traditional "object hiding" and "interface contracts" are not what Python is all about. It is just like learning anything else, you have to acclimate to the programming environment and its rules.

The other part of your question:

Are dictionaries a best practice tool for passing data around? If so, how do developers solve this kind of problem?

The answer here is depends on your problem domain. If your problem domain does not lend itself to custom objects, then you can pass around any kind of container (lists, tuples, dictionaries) around. If however all you have to pass around decorated data ("rich" data) is objects, then your code becomes littered with classes that don't define behavior but rather properties of things.

Oh, by the way - this getting of keys and raising KeyError problem is already solved, as Python dictionaries have a get method, which can return a default value (it returns the sentinel None object by default) when a key doesn't exist:

>>> d = {'a': 'b'}
>>> d['b']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'b'
>>> d.get('b')  # None is returned, which is the only
                # object that is not printed by the
                # Python interactive interpreter.
>>> d.get('b','default')
'default'

Upvotes: 5

MAK
MAK

Reputation: 26586

I don't think passing dictionaries is the only way to pass structured data across parts of a system. I've seen lots of people use classes for that. Actually namedtuple is a good fit for that as well.

Without unit testing, the developer would have no reliable way of knowing where these errors will appear

Now why would you not write unit tests?

In Python you don't rely on a compiler to catch your errors. If you really need static checking of your code, you can use one of several static analysis tools out there (see this question)

Upvotes: 1

Related Questions