Reputation: 1551
I have set A of n elements, size of each element is considerable. I have map M of size k, where k is just indices 0..k and value is elements of one of the subset.
As you can see I need to store k subsets in the map. value of k and n could be in million. So the memory for map is blowing up. As many elements in these k subsets would be common. Its like wasting lot of memory in redundancy.
Is there a way to map a particular subset and map it to key in M, so that program takes space for only n elements. Or any other better way to optimize this memory?
Upvotes: 2
Views: 627
Reputation: 25516
std::vector<Element> allElements;
std::map<unsigned int, std::vector<std::vector<Element>::iterator>>; //(!)
This approach is fine, if you create the set of n elements once and don't ever modify it again (the set itself, you can modify the elements inside).
Changing the set during run-time might invalidate the iterators, though; if you only push_back
new elements, you could store indices into the set instead (edit: for the two fore-mentioned use cases, have a look at Michał Komorowski's approach, too, it appears superior to me, if the subsets are really large as well, but inferior on smaller ones, as you potentially have to store many zeros if an element with large id is contained in the subset). If you remove elements in between, subsequent indices would change, then you need another data structure of which the iterators remain stable even on modifications; such data structures are e. g. std::list
or std::set
(actually std::map
as well, but that is less suitable for the given purpose unless you want some separate, explicit mapping into the set of n elements as well; note that the two unordered variants of set and map don't give that guarantee of stable iterators):
std::list<Element> allElements;
std::map<unsigned int, std::vector<std::list<Element>::iterator>>;
Only thing that you still have to pay attention for is not removing any elements from the std::list
that are yet referred to by any of the subsets.
If there are many equal subsets, you might (especially as the sets apparently are large) additionally spare some memory by sharing these:
std::list<Element> allElements;
std::map<unsigned int, std::shared_ptr<std::vector<std::list<Element>::iterator>>>;
An issue with this approach are modifications of the subsets, they'd be visible to all maps sharing the same subset – which might even be desirable, but if not, you'll need to address the issue in some appropriate manner.
Upvotes: 2
Reputation: 6238
The theory says that there are 2^N possible subsets of a given set. It also means that each subset can be uniquely identified by a number between 0 to 2^N - 1. For example let's assume that our set A has 3 elements X, Y, Z, then:
It means that in your dictionary you do not have to store the entire subset as a value but only an identifier of this subset. The only requirements is that you need to be able to change a given identifier into a subset.
In the example above I did it in a very simple way. Imagine yourself that you have a bit array of length N where N is a length of a set. This array represents a subset. Each bit in this array represents one element in a set. If a given bit is set to 1 it means that a given element is in a subset.
This bit array we can treat as a binary number. And each binary number can be easily translated into a decimal number (your id of a subset) and vice-versa.
Returning to our example. The bit array as below:
|X|Y|Z|
|1|0|1|
Denotes that X and Z are in a subset. Identifier of this subset is 5 (binary 101).
Update 1
If you really have to deal with millions of elements it means that in the apraoch described above you cannot use standard types like
int or long
as identifiers. They cannot store very big values like 2^1_000_000. Instead, you will have to use something like BigInteger
from C# or Java.
Update 2
1 million elements = 1 million bits in a bit array = 125_000 bytes = 122 kilobytes per a subset identifier.
The question is how big are your elements. This solution is useful (for 1 million elements) if 122 kb < average size of elements in a subset.
Upvotes: 4