Solace
Solace

Reputation: 9010

Arrays and Search Algorithms: How is the "average N/2 steps to search an array" average value calculated?

I have just started learning about Data Structures and Algorithms in Java (Starting with arrays). I have two questions.

  1. It appears to me that the "steps" in an algorithm's execution are actually the positions of an array the algorithm visits. Because they say that Insertion in an array takes place in one step, because the data item is simply inserted at the first available position; and so insertion is faster than Search.

    But actually it is two kinds of operations - one is visiting the array position(s), two is actually inserting the data item in that memory location.

    I am curious in why don't they give much thought to the actual insertion (or the comparison operation in the case of Search) operation (as it appears to me). I have never seen anyone discuss much about that operation when algorithms are discussed.

    Is it something about the computer memory that only visiting a memory location is something to bother about, and the operation of actually putting the data item at the memory location not something much thought is given to? Doesn't it eat up the computer's resources? (I hope I have put this question clearly!)

  2. Secondly, (assuming that the array doesn't have duplicates) I have read that in the Search algorithm (I think it's about Linear Search), it takes an average of N/2 steps (step= a visit at an array position) to search the array if there is N number of data items in the array. My question is that how is this average value calculated.

Note: I have just started reading Robert Lafore's "Data Structures and Algorithms in Java" book, and I am not sure what exactly should I read up on (computer memory?, how array items are placed in memory? confusion!) to make these things clear to me. So any tips on that are also welcome.

Upvotes: 3

Views: 1937

Answers (5)

Tarik
Tarik

Reputation: 11209

Ans to q1: Assuming you keep the set of items unsorted in an array and that you know the length of the array, all you have to do to append an item is to assign the new item to the array element immediately following the last element. That takes a single operation. If data is kept in other data structures, inserting an item might be more costly but searching will be faster. So, it's always a tradeoff.

Ans to q2: Assuming you have n elements in an array, and that you search various elements with the same probability. You will find element 1 in one step, element 2 in 2 steps...element n in n steps. If you sum 1 + 2 + 3 ...+ n and devide by n you get n (n+1) /2/ n =( n+1)/2 Again, if you decide to keep a sorted array instead of an unsorted array, then you can do a binary search and find an item in log2(n)

Upvotes: 1

Mike Nakis
Mike Nakis

Reputation: 61969

Previous answers seem to get either #1 or #2 right but not both, so eventually I had to write my own:

  1. The term "inserting" is misleading when talking about adding an item to an unordered array. It would be clearer to just speak of "appending". The OP says "the data item is simply inserted at the first available position", and in the case of arrays, the first available position is the one after the last item. (Theoretical arrays are mythical beasts that always have additional free space available at the end.) So, the item will simply be appended at the end of the array, and this is just a single write operation. (If the point of insertion i was to be in the middle of the array, then all subsequent items would have to be moved up by one position to make room for it, and that would be (N-i) reads and another (N-i) writes, in addition to the one write for storing the item at i.)

  2. For searching for an item, finding the average number of operations for a single search is the same as summing up the number of operations of all possible searches and dividing by the number of searches. Searching for the item at position 1 will take one operation, searching for the item at position 2 will take two operations, and searching for the item at position N will take N operations, so the formula is 1 + 2 + 3 + ... + N, which is known to be equal to N*(N+1)/2. Dividing by N this gives (N+1)/2, which is equivalent to N/2, because in these types of calculations we disregard constant factors. (The '+1' is a constant factor.) Note: all of this assumes not only that there are no duplicates, but also that each sought-for item ends up being found in the array, because searching for an item which is not in the array would result in a full array traversal which would take N operations.

Upvotes: 2

Eran
Eran

Reputation: 393771

  1. An array is typically a random access structure, which means that you can access the i'th element without having to first iterate over the first i-1 elements. Therefore both reading from and writing to an index of the array (which is mapped to some memory address) can be assumed to take a constant ammount of time.

  2. Linear search requires you to iterate over all the elements of the array until you find the one you are looking for. The average running time is a matter of probability. The probability of the i'th element being the one you are looking for is 1/n, and you will have to iterate over i elements to find it. Therefore, the expected running time is :

    1/n * (1 + 2 + ... + n) = 1/n * (n+1)*n/2 = (n+1)/2

Upvotes: 1

Tobb
Tobb

Reputation: 12180

Second question first: It's simple really, if you have an array with 100 positions, and search for something that is a random position, there is a 1/100 chance that it is in each array slot. Further, there is an equal chance of visiting 1-100 positions of the array to find what you are looking for.

The best case would then be 1 (you find it in the first slot), the worst case would be 100 (it's in the last slot), and the average is in between these two (being equally distributed), so 50. (There is an equal change of having to visit more than half the array slots, and less than half the array slots.)

Translated to variable numbers, you get:

Best case: 1

Worst case: N

Average: N/2

First question:

Learning some C would probably make this clearer. Because, an array is nothing more than adressed blocks of memory. Say you have an array of ints, then each int takes up a piece of memory (for instance 16-bit/2 bytes). When you have an array of say 100 ints, you have 100 16-bit memory blocks after one another, so accessing one of these directly you would need to know the position of the first block, and then add 16 bit * position to find the correct memory block. The reference to the array is nothing more than a reference to the memory block of the first element, so in order to find the int in position 42, you need to visit the memory block adressed at adress of the first block + 42 * length of each block.

On the contrary, a linked list, where each element contains a reference to the next element (and they are not necessarily even close to each other in memory) would need more than constant time to find an element in position X, since you would have to pass through all the elements before the one in position X to get to it.

Upvotes: 1

Positive
Positive

Reputation: 62

The search time for your data search in array or other data types is largely depend on the searching algorithm that you are using. Some algorithms can have good time complexity but poor space complexity and vice versa. The divide and conquered algorithm uses Log2(N) in time complexity to search. In calculating the run time such as (N/2), it always require good understanding of the algorithm. In addition, there are two cases in time complexity, you can have the worst case and best case. The N/2 will have to belong to one of these two instances of time complexity

Upvotes: -1

Related Questions