ASHISH GUPTA
ASHISH GUPTA

Reputation: 61

Performance comparison between unordered_set vs linked list find

I am converting a C file to C++. Since the functions will still be called from C code I will place the entire file in extern "C" block. The file contains following code-

struct node{
    char* name;
    struct node* next;
};

static struct node* list;  //file scope

void insertInList(FILE*){
    read file line-by-line and add names present in file to the list
}
bool isNamePresent(char* name){
    //iterate through Linked-list & returnt true if present
}

Now, it appears to me that the complexity of 'isNamePresent' can be improved by using unordered_set . But, looking at customer usage it appears that generally few names are entered in the list.(sometimes it is just 1)

Q1) So, should I still change the code to use unordered_set? Will it be still considered a good change in terms of performance or any other terms?If so please explain why?? Also, does scenarios like "what if user enters hundred thousand names in file" are considered during software development when we know the general use pattern?

Q2) How should I write the set in the file? What is the difference between following lines written in global space-

static std::unordered_set<std::string> st;
vs 
namespace{
    static std::unordered_set<std::string> st;
}//anonymous namespace

Is the first one initialized with some garbage value?

Upvotes: 1

Views: 420

Answers (1)

Michael Kenzel
Michael Kenzel

Reputation: 15951

In general, the only way to really know which approach will perform best in your application scenario is to measure the performance of each approach in your application scenario. That being said, I would simply go with the unordered_set. My main reason for that would be readability. unordered_set<string> conveys very clearly what you are doing here: storing a bunch of strings for the purpose of keeping track of a set of elements and efficiently checking whether a given string is part of the set (because that's about the only thing you can really do with an unordered_set). On the other hand, a linked list can be used for many purposes, implementing a set not being a very common one. One would have to infer what the list is being used for from the way the list is being used.

Furthermore, while unordered_set is not necessarily the most efficient hash-table one could imagine, it's not that bad, and searching a linked list is most likely worse. While linear search in a contiguous container such as an std::vector may have a performance advantage in some cases when there are only a few items, the reason for this advantage stems from the fact that iterating through a contiguous piece of memory is very efficient on modern processors. A linked list is not generally contiguous. Even if the list items happen to be allocated in a contiguous way, there is still a memory and runtime overhead compared to iteration through a plain vector. The main advantage of a linked list over an std::vector is that the list supports random insertion in O(1) time complexity and that pointers to list items stay valid if the list is modified. It would seem that neither of these properties are relevant to your case. unordered_set also has O(1) average insertion time complexity. And it has O(1) average lookup time complexity (compared to O(n) for the list). While lookup in an unordered_set will generally involve a few indirections, lookup in a linked list will almost certainly involve more.

So if you want to choose here, the choice should most likely be between std::unordered_set and std::vector. Unless you really need one of the properties only a linked list can give you (e.g., pointers to items stay valid when the container is modified; however, in this case, you may also want to consider std::set instead of a linked list). If you don't, I would go with the std::unordered_set. If performance is really critical (which it most likely is not given that a simple linked list seems to have worked sufficiently well so far): measure, compare, profile…

Concerning your second question: there is no real difference between

static std::unordered_set<std::string> st;

and

namespace {
    std::unordered_set<std::string> st;
}

These are just two different ways to make things have internal linkage. In C++, I would use the unnamed namespace (note: no need for static if you're already using an unnamed namespace) as it would seem more C++-esque. static is generally used for making static local and member variables; this particular usage of static, to make global variables with internal linkage, is mainly there for C compatibility…

Upvotes: 2

Related Questions