Reputation: 137
Given two strings a, and b of lengths |a| and |b|, compute the number of elements that should be deleted in total such that a and b are anagrams of each other.
The answer is found by counting the number of elements not in the intersection of sets a and b.
What is the computational cost of doing such compuation using a.symmetric_difference(b)
? documentation
Upvotes: 1
Views: 66
Reputation: 28302
As pointed out in the comments, what you'd really need is a sort of symmetric difference for multisets. We can sketch out what an implementation of this could look like and see what the computational cost is.
A simple implementation for strings would be to create two frequency tables for the symbols in the strings, then construct a third frequency table for the symmetric difference.
Constructing the frequency table for each of the input strings is easy:
for each character in the string
if the table contains an entry for the character
increment the entry for the character
otherwise
create an entry for the character with frequency set to one
We may now assume we have our two frequency tables freq1
and freq2
. To construct the symmetric difference we can loop through each table's keys and compare to the other table:
for each character in freq1.keys
f1 = freq1[character]
f2 = 0
if freq2 has an entry for character then set f2 = freq2[character]
if f1 >= f2 then symmdiff[character] = f1 - f2
for each character in freq2.keys
f2 = freq2[character]
f1 = 0
if freq1 has an entry for character then set f1 = freq1[character]
if f2 >= f1 then symmdiff[character] = f2 - f1
Our table symmdiff
now contains enough information to answer your question: if we loop through once more and add up all the frequencies, that will tell us how many total characters must be deleted (though it does not tell us how many to delete from either table specifically, just the combined total; if we wanted to know from each table specifically, we could calculate two tables for each difference, rather than one for the symmetric difference)
sum = 0
for each character in symmdiff.keys
sum = sum + symmdiff[character]
So, what is the computational complexity of this? The worst case is when each string has the same length and each consists of n distinct characters (so 2n distinct characters are used in total). Then, the time complexity is:
Assuming O(1) table inserts/access and ignoring the true cost of adding numbers when computing the sum, this process is basically linear in both time and space. You can think of it as looping over the whole input (the pair of input strings) three times (6n iterations) in the worst case.
Upvotes: 1