Looping through and comparing subsets of arrays

Question

I have two arrays:

int group_id[] = {1, 1, 2, 2, 2, 3, 3, 3};
int value[]    = {1, 0, 3, 5, 0, 2, 1, 6};

From the second array, I need to return the largest value within the group_id index (not including the current index position), the result (in a new array) would be:

{0, 1, 5, 3, 5, 6, 6, 2}

The arrays are a lot longer (~10 millions), so looking for an efficient solution.

Clarification:
The first two elements of value belong to group_id = 1, the first element will return 0 as the highest value as it can't return its self. The second element will will return 1 as it's the largest value in group_id 1.

the third, fourth and fifth elements (3, 5, 0) belong to group_id 2, the first will return 5, the second 3 (as it can't return its own index and the third will return 5).

It isn't clear that all the elements in group_id with the same number are adjacent (but that is crucial for efficiency).

Good point, you can assume they are all adjacent.

It isn't clear what should happen if there was only one entry in group_id with a given value — there isn't an alternative entry to use, so what should happen (or should the code abandon ship if the input is invalid).

Assume invalid.

Jonathan Leffler · Accepted Answer

The problem can be solved in O(N) time; it does not need O(N•log N) and sorting. This code shows how:

/* SO 5723-6683 */
#include 
#include 

static void dump_array(const char *tag, int size, int *data);
static void test_array(const char *tag, int size, int *groups, int *values);

int main(void)
{
    int groups1[] = { 1, 1, 2, 2, 2, 3, 3, 3 };
    int values1[] = { 1, 0, 3, 5, 0, 2, 1, 6 };
    int groups2[] = { 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5 };
    int values2[] = { 1, 1, 3, 5, 0, 2, 1, 6, 6, 3, 5, 5, 5, 3, 2, 3, 7, 3 };
    enum { NUM_VALUES1 = sizeof(values1) / sizeof(values1[0]) };
    enum { NUM_VALUES2 = sizeof(values2) / sizeof(values2[0]) };

    test_array("Test 1", NUM_VALUES1, groups1, values1);
    test_array("Test 2", NUM_VALUES2, groups2, values2);
    return 0;
}

static void test_array(const char *tag, int size, int *groups, int *values)
{
    printf("%s (%d):
", tag, size);
    dump_array("values", size, values);
    dump_array("groups", size, groups);

    int output[size];
    int grp_size;
    for (int lo = 0; lo < size - 1; lo += grp_size)
    {
        assert(groups[lo+0] == groups[lo+1]);
        grp_size = 2;
        int max_1 = (values[lo+0] < values[lo+1]) ? values[lo+1] : values[lo+0];
        int max_2 = (values[lo+0] < values[lo+1]) ? values[lo+0] : values[lo+1];
        for (int hi = lo + 2; hi < size && groups[hi] == groups[lo]; hi++)
        {
            grp_size++;
            if (values[hi] >= max_1)
            {
                max_2 = max_1;
                max_1 = values[hi];
            }
            else if (values[hi] >= max_2)
                max_2 = values[hi];
        }
        for (int i = lo; i < lo + grp_size; i++)
            output[i] = (values[i] == max_1) ? max_2 : max_1;
    }

    dump_array("output", size, output);
}

static void dump_array(const char *tag, int size, int *data)
{
    printf("%s (%d):", tag, size);
    for (int i = 0; i < size; i++)
        printf(" %d", data[i]);
    putchar('
');
}

Output from this test program:

Test 1 (8):
values (8): 1 0 3 5 0 2 1 6
groups (8): 1 1 2 2 2 3 3 3
output (8): 0 1 5 3 5 6 6 2
Test 2 (18):
values (18): 1 1 3 5 0 2 1 6 6 3 5 5 5 3 2 3 7 3
groups (18): 1 1 2 2 2 3 3 3 3 3 4 4 4 5 5 5 5 5
output (18): 1 1 5 3 5 6 6 6 6 6 5 5 5 7 7 7 3 7

Looping through and comparing subsets of arrays

Answers (2)

Related Questions