Reputation: 220
I need to think of a data structure, which supports the following operations efficiently:
1) Add an integer x
2) Delete an integer with maximum frequency (if there are more than one element with the same maximum frequency delete all of them).
I am thinking of implementing a segment tree where each node stores the index of its child having largest frequency.
Any ideas or suggestions on how to approach this problem or how should it be implemented would be kindly appreciated.
Upvotes: 2
Views: 1037
Reputation: 66
We can use a combination of data structures. A hash_map to maintain the frequency mappings, where the key is the integer, and value a pointer to a "frequency" node representing the frequency value and the set of integers having the same frequency. The frequency nodes will be maintained in a list ordered by the values of the frequencies.
The Frequency node can be defined as
class Freq {
int frequency;
Set<Integer> values_with_frequency;
Freq prev;
Freq next;
}
The elements HashMap would then contain entries of the form
Entry<Integer, Freq>
So, for a snapshot of the dataset such as
a,b,c,b,d,d,a,e,a,f,b
where the letters denote integers, the following would be how the data structure would look like.
c -----> (1, [c, e, f])
|
|
e --
|
|
f --
a -----> (3, [a, b])
|
|
b --
d --> (2, [d])
The Freq nodes would be maintained in a linked list, say freq_nodes
, sorted by the frequency value. Note that, as explained below, there wouldn't be any log(n) operation needed for keeping the list sorted on the add/delete operations.
The way the add(x)
, and delete_max_freq()
operations could be implemented is as follows
add(x) :
If x is not found in the elements
map, check if the first element of the freq_nodes
contains the Freq object with frequency 1. If so, add x to the values_with_frequency
set of the Freq object. Otherwise, create a new Freq object with 1 as the frequency value and x added to the (now only single element) wrapped set values_with_frequency
Otherwise, (i.e. if x is already there in the elements
map), follow the pointer in the value of the entry corresponding to x in elements to the Freq object in the freq_nodes
, remove x from the values_with_frequency
field of the Freq object, noting the current value of x’s frequency which is the value of elements.get(x).frequency
(Hold this value in say F). If the set values_with_frequency
is rendered empty due to this removal, delete the corresponding node from the freq_nodes
linked list. Finally if the next Freq node in the freq_nodes
linked list has the frequency F+1, just add x to the values_with_frequency
field of the next node. Otherwise just create a Freq node as was done in the case of non-existence of Freq node with frequency 1 above.
Finally, add the entry (x, Freq)
to the elements
map.
Note that this whole add(x) operation is going to be O(1) in time.
Here's an example of a sequence of add() operations with the subsequent state of the data structure.
add(a)
a -> N1 : freq_nodes : |N1 (1, {a}) | ( N1 is actually a Freq object)
add(b)
a -> N1 : freq_nodes : |N1 (1, {a, b}) |
b -> N1
add(a)
At this point ‘a’ points to N1, however, its current frequency is 2, so we need to insert a node N2 next to N1 in the DLL, after removing it from N1’s values_with_frequency
set {a,b}
a -> N2 : freq_nodes : |N1 (1, {b}) | --> |N2 (2, {a}) |
b -> N1
The interesting thing to note here is that any time we increase the frequency of an existing element from F to say F+1, we need to do the following
if (next node has a higher frequency than F+1 or we have reached the end of the list):
create a new Freq node with frequency equal to F+1 (as is done above)
and insert it next to the current node
else :
add ‘a’ (the input to the add() operation) to the ```values_with_frequency``` set of the next node
The delete_max_freq() operation would just involve removing the last entry of the linked list freq_nodes
, and iterating over the keys in the wrapped set values_with_frequency
to remove the corresponding keys from the elements
map. This operation would take O(k) time where k is the number of elements with maximum frequency.
Upvotes: 4
Reputation: 17805
My thoughts:
You will need 2 maps.
Map 1: Integer as key with frequency as value.
Map 2: Have a map of frequencies as keys and list of integers as values.
Add Integer: Add the integer to map 1. Get the frequency. Add it to the list of frequency key in map 2.
Delete Integer : We can obviously maintain maximum frequency in a variable across these operations. Now, remove the key from map2 which has this max frequency and decrement max frequency.
So, adding and deleting performance should be O(1) on average.
In the above scenario, we will still have integers in map 1 which exist and have the frequency which is unrealistic after the delete from map 2. In this case, when same integer gets added, we do an on demand update in map 1, meaning, if current frequency in map 1 doesn't exist in map 2 for this integer, it means it was deleted and we can reset that to 1 again.
Implementation:
import java.util.*;
class Foo{
Map<Integer,Integer> map1;
Map<Integer,Set<Integer>> map2;
int max_freq;
Foo(){
map1 = new HashMap<>();
map2 = new HashMap<>();
map2.put(0,new HashSet<>());
max_freq = 0;
}
public void add(int x){
map1.putIfAbsent(x,0);
int curr_f = map1.get(x);
if(!map2.containsKey(curr_f)){
map1.put(x,1);
}else{
map1.merge(x,1,Integer::sum);
}
map2.putIfAbsent(map1.get(x),new HashSet<>());
map2.get(map1.get(x)-1).remove(x); // remove from previous frequency list
map2.get(map1.get(x)).add(x);// add to current frequency list
max_freq = Math.max(max_freq,map1.get(x));
printState();
}
public List<Integer> delete(){
List<Integer> ls = new ArrayList<>(map2.get(max_freq));
map2.remove(max_freq);
max_freq--;
while(max_freq > 0 && map2.get(max_freq).size() == 0) max_freq--;
printState();
return ls;
}
public void printState(){
System.out.println(map1.toString());
System.out.println("Maximum frequency: " + max_freq);
for(Map.Entry<Integer,Set<Integer>> m : map2.entrySet()){
System.out.println(m.getKey() + " " + m.getValue().toString());
}
System.out.println("----------------------------------------------------");
}
}
Demo: https://ideone.com/tETHKV
Note: The call to delete()
is amortized.
Upvotes: 2
Reputation: 156469
Assuming "efficient" refers to the way the complexity of those operations scale, big-O style, I'd consider something consisting of:
When a number is inserted:
1. Look up the number in the hashmap to find its frequency. (O(1)
)
2. Look up the frequency in the tree (O(log N)
). Remove the number from its collection (O(1)
). If the collection is empty, remove the frequency from the tree (O(log N)
).
3. Increment the number's frequency. Set that value in the hashmap (O(1)
).
4. Look up its new frequency in the tree (O(log N)
). If it's there, add the number to the collection there (O(1)
). If not, add a new node with the number in its collection (O(log N)
).
When deleting items with the maximum frequency:
1. Remove the highest-valued node from the tree (O(log N)
).
2. For each number in that node's collection, remove that number's entry from the hashmap (O(1)
for each number removed).
If you have N numbers to add and remove, your worst-case scenario should be O(N log N)
regardless of the actual distribution of frequencies or the order in which numbers are added and removed.
If you know of any assumptions you can make about the numbers being added, it's possible you could make further enhancements like using an indexed array rather than an ordered tree. But if your inputs are fairly unbounded, this seems like a pretty good structure to handle all the operations you want without getting into O(n²)
territory.
Upvotes: 2