Priyanka.Patil
Priyanka.Patil

Reputation: 1197

Given a situation, how to decide on a data structure?

I'm preparing to attend technical interviews and have faced mostly questions which are situation based.Often the situation is a big dataset and I'm asked to decide which will be the most optimal data structure to use.

I'm familiar with most data structures,their implementation and performance. But I fall under dilemma when given situations and be decisive on structures.

Looking for steps/algorithm to follow in a given situation which can help me arrive at the optimum data structure within the time period of the interview.

Upvotes: 0

Views: 517

Answers (2)

Suchith J N
Suchith J N

Reputation: 11

It depends on what operations you need to support efficiently.

Let's start from the simplest example - you have a large list of elements and you have to find the given element. Lets consider various candidates

You can use sorted array to find an element in O(log N) time using Binary search. What if you want to support insertion and deletion along with that? Inserting an element into a sorted array takes O(n) time in the worst case. (Think of adding an element in the beginning. You have to shift all the elements one place to the right). Now here comes binary search trees (BST). They can support insertion, deletion and searching for an element in O(log N) time.

Now you need to support two operations namely finding minimum and maximum. In the first case, it is just returning the first and the last element respectively and hence the complexity is O(1). Assuming the BST is a balanced one like Red-black tree or AVL tree, finding min and max needs O(log N) time. Consider another situation where you need to return the kth order statistic. Again,sorted array wins. As you can see there is a tradeoff and it really depends on the problem you are given.

Let's take another example. You are given a graph of V vertices and E edges and you have to find the number of connected components in the graph. It can be done in O(V+E) time using Depth first search (assuming adjacency list representation). Consider another situation where edges are added incrementally and the number of connected components can be asked at any point of time in the process. In that situation, Disjoint Set Union data structure with rank and path compression heuristics can be used and it is extremely fast for this situation.

One more example - You need to support range update, finding sum of a subarray efficiently and no new elements are inserting into the array. If you have an array of N elements and Q queries are given, then there are two choices. If range sum queries come only after "all" update operations which are Q' in number. Then you can preprocess the array in O(N+Q') time and answer any query in O(1) time (Store prefix sums). What if there is no such order enforced? You can use Segment Tree with lazy propagation for that. It can be built in O(N log N) time and each query can be performed in O(log N) time. So you need O((N+Q)log N) time in total. Again, what if insertion and deletion are supported along with all these operations? You can use a data structure called Treap which is a probabilistic data structure and all these operations can be performed in O(log N) time. (Using implicit treap).

Note: The constant is omitted while using Big Oh notation. Some of them have large constant hidden in their complexities.

Upvotes: 1

Stefan Haustein
Stefan Haustein

Reputation: 18813

  • Start with common data structures. Can the problem be solved efficiently with arrays, hashtables, lists or trees (or a simple combination of them, e.g. an array of hastables or similar)?

  • If there are multiple options, just iterate the runtimes for common operations. Typically one data structure is a clear winner in the scenario set up for the interview. If not, just tell the interviewer your findings, e.g. "A takes O(n^2) to build but then queries can be handled in O(1), whereas for B build and query time are both O(n). So for one-time usage, I'd use B, otherwise A". Space consumption might be relevant in some cases, too.

  • Highly specialized data structures (e.g. prefix trees aka "Trie") are often that: highly specialized for one particular specific case. The interviewer should usually be more interested in your ability to build useful stuff out of an existing general purpose library -- opposed to knowing all kinds of exotic data structures that may not have much real world usage. That said, extra knowledge never hurts, just be prepared to discuss pros and cons of what you mention (the interviewer may probe whether you are just "name dropping").

Upvotes: 0

Related Questions