Reputation: 423
I feel like this must have been asked before (probably more than once), so potential apologies in advance, but I can't find it anywhere (here or through Google).
Anyway, when explaining the difference between lists and tuples in Python, the second thing mentioned, after tuples being immutable, is that lists are best for homogeneous data and tuples are best for heterogeneous data. But nobody seems to think to explain why that's the case. So why is that the case?
Upvotes: 11
Views: 11974
Reputation: 899
The difference is philosophical more than anything.
A tuple is meant to be a shorthand for fixed and predetermined data meanings. For example:
person = ("John", "Doe")
So, this example is a person, who has a first name and last name. The fixed nature of this is the critical factor. Not the data type. Both "John" and "Doe" are strings, but that is not the point. The benefit of this is unchangeable nature:
You are never surprised to find a value missing. person always has two values. Always.
You are never surprised to find something added. Unlike a dictionary, another bit of code can't "add a new key" or attribute
This predictability is called immutability It is just a fancy way of saying it has a fixed structure.
One of the direct benefits is that it can be used as a dictionary key. So:
some_dict = {person: "blah blah"}
works. But:
da_list = ["Larry", "Smith"]
some_dict = {da_list: "blah blah"}
does not work.
Don't let the fact that element reference is similar (person[0] vs da_list[0]) throw you off. person[0] is a first name. da_list[0] is simply the first item in a list at this moment in time.
Upvotes: 8
Reputation: 251408
First of all, that guideline is only sort of true. You're free to use tuples for homogenous data and lists for heterogenous data, and there may be cases where that's a fine thing to do. One important case is if you need the collection to the hashable so you can use it as a dictionary key; in this case you must use a tuple, even if all the elements are homogenous in nature.
Also note that the homogenous/heterogenous distinction is really about the semantics of the data, not just the types. A sequence of a name, occupation, and address would probably be considered heterogenous, even though all three might be represented as strings. So it's more important to think about what you're going to do with the data (i.e., will you actually treat the elements the same) than about what types they are.
That said, I think one reason lists are preferred for homogenous data is because they're mutable. If you have a list of several things of the same kind, it may make sense to add another one to the list, or take one away; when you do that, you're still left with a list of things of the same kind.
By contrast, if you have a collection of things of heterogenous kinds, it's usually because you have a fixed structure or "schema" to them (e.g., the first one is an ID number, the second is a name, the third is an address, or whatever). In this case, it doesn't make sense to add or remove an element from the collection, because the collection is an integrated whole with specified roles for each element. You can't add an element without changing your whole schema for what the elements represent.
In short, changes in size are more natural for homogenous collections than for heterogenous collections, so mutable types are more natural for homogenous collections.
Upvotes: 17
Reputation: 213368
It's not a rule, it's just a tradition.
In many languages, lists must be homogenous and tuples must be fixed-length. This is true of C++, C#, Haskell, Rust, etc. Tuples are used as anonymous structures. It is the same way in mathematics.
Python's type system, however, does not allow you to make these distinctions: you can make tuples of dynamic length, and you can make lists with heterogeneous data. So you are allowed to do whatever you want with lists and tuples in Python, it just might be surprising to other people reading your code. This is especially true if the people reading your code have a background in mathematics or are more familiar with other languages.
Upvotes: 4
Reputation: 6935
Lists are often used for iterating over them, and performing the same operation to every element in the list. Lots of list operations are based on that. For that reason, it's best to have every element be the same type, lest you get an exception because an item was the wrong type.
Tuples are more structured data; they're immutable so if you handle them correctly you won't run into type errors. That's the data structure you'd use if you specifically want to combine multiple types (like an on-the-fly struct
).
Upvotes: 0