Reputation: 157
While working with arrays in any language, I have always wondered why does the base address or index number of an array start with zero.
int x[5]={21,34,55,314,45};
now if I want to access any the first value of an array I would have to use x[0]
, but why 0 is there any logic behind this?
Upvotes: 10
Views: 6960
Reputation: 2855
Neither "half-open intervals" nor "circular buffers" are valid reasons to justify why one option is distinctly superior. The following illustrates the existence of constructs with identical verbosity and complexity for either form of array-indexing.
[1] Half-open intervals
A common argument in favor of zero-based indexing is claiming
0 <= idx < n
being more elegant than either of these
1 <= idx <= n
1 <= idx < n + 1
[0, n)
is simply a form of half-open intervals, specifically, a right-open interval. It's equivalent counter part is a left-open interval (0, n]
.
For one-based indexing, the equivalent expression would be
0 < idx <= n
Thus allowing the idiomatic zero-based for()
loop construct to be directly repurposed for one-based indexing with a very minor tweak
for (idx = 0; idx < n; idx++) {..} # zero-based
for (idx = 0; idx++ < n; ) {..} # one-based
The risk of "off-by-one" errors are equivalent in both forms, because the initial condition, relational comparison operator, the high side threshold/cap, as well as the value of idx
being compared against, are identical in both forms.
The sole difference being when post-increment action is performed - before each loop cycle, or afterwards. If anything, one-based variant has a slight reduction in verbosity.
[2] Circular Buffers
Another common argument in favor of zero-based indexing is claiming
idx = (idx + 1) % n
being more elegant than requiring the extra + 1
every time
idx = (idx + 1) % n + 1
in a circular or cyclical data structure. Right off the bat, one observes the actual equivalent expression for one-based indexing should be
idx = (idx) % n + 1
idx = idx % n + 1
(once you prune out the now-superfluous pair of(
)
)
Both zero- and one-based variants perform the exact same set of operations, in a different order. But are modulos the only way to do a circular buffer ?
In the unsigned-only sense, what modulo %
actually does is same as multiplying a boolean outcome
idx *= ++idx < n
because the underlying objective is simply - reset the index (idx
) at any point of your choosing such that its long-run cyclicality has a period of n
. Zero-based indexing requires resetting to zero, so the appropriate operators are either modulo %
or multiplication *
.
So what's the equivalent resetting mechanism/operator for one-based indexing ? Exponentation -
idx ^= idx++ < n
...because a FALSE boolean outcome means taking the zero-th power of the base, thus resetting the index back to 1. The key difference being that zero-based index resets upon idx % n == 0
, one-based resets upon idx % n == 1
but only when the input size exceeds the size of the circular buffer.
In fact, any one of these 5 expressions achieve the same circular buffer for one-based indexing :
idx = idx % n + 1
idx ^= idx++ < n
idx = ++idx^(idx <= n)
idx = (1+idx)^(idx < n)
idx += (1 - n)^(idx == n)
1=1 2=2 3=3 4=1 5=2 6=3 7=1 8=2 9=3
1=1 2=2 3=3 4=1 5=2 6=3 7=1 8=2 9=3
1=1 2=2 3=3 4=1 5=2 6=3 7=1 8=2 9=3
1=1 2=2 3=3 4=1 5=2 6=3 7=1 8=2 9=3
1=1 2=2 3=3 4=1 5=2 6=3 7=1 8=2 9=3
[ps:] Zero-based indexing also means there isn't a single unsigned value for representing a never-possibly-valid and/or non-existent index value, unless one either
arbitrarily carving out and designating
UINT_MAX
(of any integer width) for such a value (and a comprehensive framework to enforce this constraint),
opts for singular constructs such as
None
/nil
/null
, which likely entails returning a list or tuple, multiple values, or an encapsulated data type, or
catch exceptions being thrown, leading to functional "impurity" in the strict functional programming sense.
Instead of being forced to choose among 3 simplicity-compromising verbosity-amplifying options, with one-based indexing, the choice for such a special value is self-apparent :
0 -
Sensible mappings for boolean interpretation of these index values come free of charge - all valid indices are boolean True
, and the special value for never-possibly-valid / non-existent index is boolean False
In other words, instead of having another function for index_exists()
, just directly type recast int -> bool
the unsigned return value of an index()
function.
Upvotes: 0
Reputation: 756
Computing often - but not always - involves modular arithmetic since we are doing operations to data items held in contiguous blocks of memory that is itself laid out physically in arrays of set width and/or height or in sectors or cylinders. After reaching the memory element at the extreme column we want our incrementor to return to the first element on the next row.
Yes, we can do this via:
Increment
i := (i % n) + 1
Decrement
i := (n + i - 2) % n + 1
But it is clearer and more efficient to do it with indices starting at zero:
Increment
i := (i + 1) % n
Decrement
i := (n + i - 1) % n
Upvotes: 1
Reputation: 12181
Just to build upon the other answers, the index is an offset. When you create an array, the program lays out enough consecutive memory locations to "fit" an array of that size. In C and related languages, the variable is storing a pointer to the first item.
Assume that each location is 32 bits (which may or may not be true) and a first address of, say, 200. (Yes, I do realize that it's probably terrible to do this in decimal, but bear with me). The first item is at address (200 + (32 * 0)) = 200
. The second item is at address 200 + (32 * 1) = 232
. The third item is at address 200 + (32 * 2) = 264
. Etc.
The main reason you can access arbitrary items in the array in constant time is that you can just do pointer arithmetic to find any arbitrary item in the array, which can be done in constant time.
Upvotes: 1
Reputation: 112
When working with sub-sequences of natural numbers, the difference between the upper bound and the lower bound should be the length of the sub-sequence. The indices of an array can be thought of as a special kind of such a sub-sequence.
The lower bound should be inclusive, the upper bound should be exclusive. In other words, the lower bound should be the first index of the array. Otherwise, we risk having to have a lower bound in the unnatural numbers for some sub-sequences.
If we want to maintain conditions (1) and (2), then we effectively have two choices for upper and lower bounds: 1 <= i < N+1 or 0 <= i < N. Clearly, putting N+1 in the range is ugly, so we should prefer indexing starting from 0.
Upvotes: 1
Reputation: 7753
In general, the logic of working with zero-indexed arrays is considered to be simpler, once you think of the indices as offsets from the origin rather than as "places in a list": you start at the start of the list, and the first item is where you are when you start, ie, when you've gone zero steps.
In a sense, though, it's fundamentally arbitrary and probably convention is the best answer. If we consider that most modern languages were designed by people who learned C very early on in their careers, or more recently who based their designs on those C-inspired languages, the choice of zero-indexed arrays would have been a pretty big change which would have required a lot of justification, which nobody seems to have found, or possibly even looked for. However, if there were ever a real reason to use 1-indexed arrays, there wouldn't be any real reason not to use them.
Of course, as we get into more modern language design the idea of caring about the index of an array element is receding from relevance. For example, python programmers loop over an index like this:
for element in lst:
do_stuff_to(element)
and therefore don't have a reason to care about the index. Ruby has even hipper ways of dealing with this problem, which I won't illustrate here but which you should look into.
Upvotes: 2
Reputation: 777
In C, the name of an array is essentially a pointer, a reference to a memory location, and so the expression array[n] refers to a memory location n-elements away from the starting element. This means that the index is used as an offset. The first element of the array is exactly contained in the memory location that array refers (0 elements away), so it should be denoted as array[0]. Most programming languages have been designed this way, so indexing from 0 is pretty much inherent to the language as most of the languages (not all) follow C standards. You can refer this link for more details.
Upvotes: 6