C++ effective comparison of integer sequences (in relative order)

I would be grateful for a help with an effective implementation of an comparison algorithm in C++. My program gets an input that consists of rows of integer sequences, and I need to find which sequences are duplicates. But some sequences might be shift into side, and it should still be equal. With that I mean for example sequences {0, 1, 22, 5, 9} and {22, 5, 9, 0, 1} should be equal. These sequences or number of duplicate sequences might be of an size.

I can't seem to think of anything that is in some way effective (comparing every new row to all of the rest takes just way too much time), so I hope someone can help. Thanks in advance!

Upvotes: 0

Answers (3)

Ztin

Reputation: 11

It is possible to solve this in time linear to the size of the input (assuming that the integers in the sequences can be sorted in linear time, otherwise there will be an additional log-factor): The key idea is to transform the sequences such that two sequences are equal after the transformation if and only if they were equal up to rotation before.

One such transformation would be the lexicographically smallest rotation. E.g. the smallest rotation of {22, 5, 9, 0, 1}, {5, 9, 0, 1, 22} and {9, 0, 1, 22, 5} is {0, 1, 22, 5, 9}. There are numerous ways to find the smallest rotation of a string S in linear time, perhaps the simplest is using Duval's algorithm (see e.g. https://codeforces.com/blog/entry/90035 or https://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation).

After rotating, you can identify duplicates using e.g. a hash-set. If you truly want linear worst-case time complexity, you can sort the sequences using radix-sort and then compare consecutive sequences in the sorted list.

Alternatively, you can combine the approaches of Bo Tian and 6502: Compute a rotation-independent hash for each sequence and then compare two sequences S and T with the same hash by searching S in TT using e.g. the Knuth-Morris-Pratt-algorithm (https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm).

Upvotes: 1

Bo Tian

Reputation: 317

If I understand correctly, the problem is to check if two sequences match each other after rotating. For example, the sequence {0, 1, 22, 5, 9} is considered as identical to the sequences

{1, 22, 5, 9, 0}
{22, 5, 9, 0, 1}
{5, 9, 0, 1, 22}
{9, 0, 1, 22, 5}

Obviously, if two sequences have different lengths, they are not identical. If they share the same length, the most simple solution to this problem is to duplicate the second sequence and check if the first one is a sub-sequence of the duplicated sequence.

For example, to check if sequence {0, 1, 22, 5, 9} and {22, 5, 9, 0, 1} are identical, simply check if {0, 1, 22, 5, 9} is part of {22, 5, 9, 0, 1, 22, 5, 9, 0, 1}.

It is straightforward to improve this approach to get rid of the use of extra memory. For example, you can search for the first element of the first sequence in the second sequence and then make a comparison from the matched points.

Upvotes: 2

6502

Reputation: 114461

A solution i can think to is computing an hash that doesn't depend on the rotation. For example:

unsigned long long hash(const std::vector<int>& seq) {
    unsigned long long result;
    for (int i=0,n=seq.size(),j=n-1; i<n; j=i++) {
        result ^= seq[i] * 69069ULL + seq[j];
    }
    return result;
}

Then you can create a std::map mapping hash code to list of indexes in the sequence so you need to do a full check only if the hash is the same.

Upvotes: 3

C++ effective comparison of integer sequences (in relative order)

Answers (3)

Related Questions