user11472365
user11472365

Reputation:

How to create an intersection with complexity in O (n)?

I need to create a function that intersects two vectors: vectorA ∩ vectorB and allocates the values ​​found in this intersection in a vectorC!

Rules:

void intersections (char a [], int n, char b [], int m, char c [], int * k)

I was thinking of using an algorithm of complexity O(n) which is called "Linear Time". However this algorithm makes it "linear" comparison obviously as the name says.

Example:

A = { 'E', 'C', 'B', 'J', 'S', 'F', 'C', 'V', 'G' }
B = { 'G', 'C', 'M', 'W', 'L', 'O' }
C = { 'G', 'C' }

How far have I come?

Currently I can fetch and compare the corresponding values ​​for the intersection.

What is my difficulty?

1) The values ​​after the comparison can not be added to the vector char c[] with repetitions. How to prevent the same?

2) How to use the pointer int *k (which is the size of the vector char c[]) and allocate the corresponding values ​​without repetitions to it?

bool checkHasStringEqual(char vectorA, char vectorB) {
    string stringA, stringB;
    stringA = toupper(vectorA),
    stringB = toupper(vectorB);

    size_t found = stringA.find(stringB);

    return (found != std::string::npos);
}

void intersection(char a[], char b[], char c[], int n, int m, int *k){
    int indexA = 0, indexB = 0, counterIntersection = 0;

    if(n > m) {
        while(indexA < n) {
            if(indexB != m) {
                if(checkHasStringEqual(a[indexA], b[indexB])) {
                    cout << "STRING A: " << a[indexA] << " == " << "STRING B: " << b[indexB] << endl;
                    counterIntersection++;
                }

                cout << "( " << a[indexA] << "-->" << b[indexB] << ")" << " -- " << "( " << indexA << ", " << indexB << ")" << endl;
                (indexA == n -1) ? (indexA = 0, indexB++) : indexA++;

            } else {
                cout << "CAME TO THE END OF THE ANALYSIS" << endl;
                break;
            }
        }
    }

    if(m > n) {
        //TODO
    }
}

Upvotes: 0

Views: 195

Answers (2)

WhozCraig
WhozCraig

Reputation: 66204

The stated task is doable in linear time using a single lookup table, even without sorting, due to the significantly restricted domain size being managed. char has at-most 1 << CHAR_BIT distinct representations. On almost all systems that is simply 8-bits, or 256 possible values. A char16_t would use 16 bits, or 65536 possible representations, and of course, a char32_t, would have over 2-billion representations.

I'm assuming we're in the domain of the first one, or even two, of these, and not the last. Given that, this is doable with a single markup table indexed by all possible values within the domain. The algorithm is simple:

  1. Start with an table of markers, size = 1 << CHAR_BIT, all initialized to false.
  2. Enumerate either input sequence.
    • For each character take the unsigned char equivalent as an index into the markup table, setting the table value to true
  3. Enumerate the other input sequence.
    • For each character take the unsigned char equivalent value as an index into he markup table. If the value is true, clear it to false and add that character to the output sequence.

When finished, the result will be a unique-set-intersection of the two input character sequences. An example is given below. As it (and many other things) was not specified, I took liberty to make k an in/out argument. On input it refers to the maximum number of chars that can fit in C[], On output it contains the number of chars actually stored.

#include <iostream>
#include <climits>

void mk_intersection(char a[], int n, char b[], int m, char c[], int * k)
{
    bool tbl[1 << CHAR_BIT] = { 0 };

    int o = *k;
    *k = 0;

    // markup a[]
    for (int i = 0; i < n; ++i)
        tbl[static_cast<unsigned char>(a[i])] = true;

    // filter b[] from markup
    for (int i = 0; i < m && *k < o; ++i)
    {
        if (tbl[static_cast<unsigned char>(b[i])])
        {
            tbl[static_cast<unsigned char>(b[i])] = false;
            c[(*k)++] = b[i];
        }
    }
}

int main(int argc, char *argv[])
{

    char A[] = { 'E', 'C', 'B', 'J', 'S', 'F', 'C', 'V', 'G' };
    char B[] = { 'G', 'C', 'M', 'W', 'L', 'O' };
    char C[std::max(sizeof A, sizeof B)] = { 0 };
    int c_len = sizeof C;

    mk_intersection(A, sizeof A, B, sizeof B, C, &c_len);

    for (int i = 0; i < c_len; ++i)
        std::cout.put(C[i]);
    std::cout.put('\n');

    return 0;
}

Output

GC

That's it. Regarding your question of "how to make main O(n log n), frankly, that's nonsense. It implies that you can, at your discretion, presort the input before invoking your intersection operation. In doing so you can implement your function using a simple one-pass merge, which would work, and not have the domain-size limit I've described earlier. But for this example it is neither needed, nor warranted. It isn't required. And frankly, both arguments a[] and b[] could (and should) be const.

Upvotes: 1

VietHTran
VietHTran

Reputation: 2318

If you don't worry about using too much memory, you can initialize 2 256-element integer arrays a_set, b_set which will be used to keep track of the occurrence of each character from 0-255 in array a and b, respectively. By using a_set, you will add any character chr from b to c if and only if a_set[chr] is equal to 1 and b_set[chr] is equal to 0. After adding chr to c, you will set b_set[chr]=1; in order to avoid repetitions of characters.

The time complexity for this approach is O(n + m) --> O(max(n,m)).

I also added the main method below so you can check if it is your desired output for C and k.

#include <iostream>

using namespace std;

bool checkHasStringEqual(char vectorA, char vectorB) {
    return toupper(vectorA) == toupper(vectorB);
}

void intersection(char a[], char b[], char c[], int n, int m, int *k){
    int indexA = 0, indexB = 0, counterIntersection = 0;
    unsigned char chr;
    // unsigned char can only go from 0-255
    int a_set[256] = {0}; // initialize array to 0
    int b_set[256] = {0}; // initialize array to 0
    *k = 0; // initialize k value to 0

    for (int i = 0; i < n; ++i) {
        chr = a[i];
        a_set[chr] = 1;
    }

    for (int i = 0; i < m; ++i) {
        chr = b[i];
        if (a_set[chr] && !b_set[chr]) {
            c[*k] = b[i]; 
            (*k)++; // increase k index
        }
        b_set[chr] = 1; // mark character as inserted
    }
}

int main() {
    int n = 9;
    int m = 6;
    int k;
    char A[n] = { 'E', 'C', 'B', 'J', 'S', 'F', 'C', 'V', 'G' };
    char B[m] = { 'G', 'C', 'M', 'W', 'L', 'O' };
    char C[max(n, m)]; // C will contain at most maximum of n, m elements

    intersection(A, B, C, n, m, &k);
    cout << "Intersection: " << endl; 
    for (int i = 0; i < k; ++i) {
        cout << C[i] << endl;
    }
    return 0;
}

Upvotes: 0

Related Questions