Reputation: 1986
I have two arrays:
int group_id[] = {1, 1, 2, 2, 2, 3, 3, 3};
int value[] = {1, 0, 3, 5, 0, 2, 1, 6};
From the second array, I need to return the largest value within the group_id
index (not including the current index position), the result (in a new array) would be:
{0, 1, 5, 3, 5, 6, 6, 2}
The arrays are a lot longer (~10 millions), so looking for an efficient solution.
Clarification:
The first two elements of value
belong to group_id
= 1, the first element will return 0 as the highest value as it can't return its self. The second element will will return 1 as it's the largest value in group_id
1.
the third, fourth and fifth elements (3, 5, 0) belong to group_id
2, the first will return 5, the second 3 (as it can't return its own index and the third will return 5).
It isn't clear that all the elements in group_id with the same number are adjacent (but that is crucial for efficiency).
Good point, you can assume they are all adjacent.
It isn't clear what should happen if there was only one entry in group_id with a given value — there isn't an alternative entry to use, so what should happen (or should the code abandon ship if the input is invalid).
Assume invalid.
Upvotes: 1
Views: 316
Reputation: 754730
The problem can be solved in O(N) time; it does not need O(N•log N) and sorting. This code shows how:
/* SO 5723-6683 */
#include <assert.h>
#include <stdio.h>
static void dump_array(const char *tag, int size, int *data);
static void test_array(const char *tag, int size, int *groups, int *values);
int main(void)
{
int groups1[] = { 1, 1, 2, 2, 2, 3, 3, 3 };
int values1[] = { 1, 0, 3, 5, 0, 2, 1, 6 };
int groups2[] = { 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5 };
int values2[] = { 1, 1, 3, 5, 0, 2, 1, 6, 6, 3, 5, 5, 5, 3, 2, 3, 7, 3 };
enum { NUM_VALUES1 = sizeof(values1) / sizeof(values1[0]) };
enum { NUM_VALUES2 = sizeof(values2) / sizeof(values2[0]) };
test_array("Test 1", NUM_VALUES1, groups1, values1);
test_array("Test 2", NUM_VALUES2, groups2, values2);
return 0;
}
static void test_array(const char *tag, int size, int *groups, int *values)
{
printf("%s (%d):\n", tag, size);
dump_array("values", size, values);
dump_array("groups", size, groups);
int output[size];
int grp_size;
for (int lo = 0; lo < size - 1; lo += grp_size)
{
assert(groups[lo+0] == groups[lo+1]);
grp_size = 2;
int max_1 = (values[lo+0] < values[lo+1]) ? values[lo+1] : values[lo+0];
int max_2 = (values[lo+0] < values[lo+1]) ? values[lo+0] : values[lo+1];
for (int hi = lo + 2; hi < size && groups[hi] == groups[lo]; hi++)
{
grp_size++;
if (values[hi] >= max_1)
{
max_2 = max_1;
max_1 = values[hi];
}
else if (values[hi] >= max_2)
max_2 = values[hi];
}
for (int i = lo; i < lo + grp_size; i++)
output[i] = (values[i] == max_1) ? max_2 : max_1;
}
dump_array("output", size, output);
}
static void dump_array(const char *tag, int size, int *data)
{
printf("%s (%d):", tag, size);
for (int i = 0; i < size; i++)
printf(" %d", data[i]);
putchar('\n');
}
Output from this test program:
Test 1 (8):
values (8): 1 0 3 5 0 2 1 6
groups (8): 1 1 2 2 2 3 3 3
output (8): 0 1 5 3 5 6 6 2
Test 2 (18):
values (18): 1 1 3 5 0 2 1 6 6 3 5 5 5 3 2 3 7 3
groups (18): 1 1 2 2 2 3 3 3 3 3 4 4 4 5 5 5 5 5
output (18): 1 1 5 3 5 6 6 6 6 6 5 5 5 7 7 7 3 7
Upvotes: 2
Reputation: 67
The following code will do it. Its efficiency is O(sum of all n_ilog(n_i)) in which n_i is the size of each subset i, unless we use MPI or OpenMP (in that case, it will be at best O(mlog(m)), in which m is the size of the greatest subset).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int compare (const void * e1, const void * e2)
{
int f = *((int*)e1);
int s = *((int*)e2);
return (f>s);
}
int main(int argc, char* argv[])
{
int group_id[] = {1, 1, 2, 2, 2, 3, 3, 3};
int value[] = {1, 0, 3, 5, 0, 2, 1, 6};
int i,j,k,count,*tmp;
for (i=0; i<8; i++)
{
/* find subsets */
count = 1;
for (j=i; j<7 && group_id[j]==group_id[j+1]; j++)
count++;
/* copy subset */
tmp = malloc(sizeof(int)*count);
memcpy(tmp, &value[i], sizeof(int)*count);
/* sort */
qsort (tmp, count, sizeof(*tmp), compare);
/* print */
for (k=i; k<=j; k++)
if (value[k] != tmp[count-1])
printf("%d ", tmp[count-1]);
else
printf("%d ", tmp[count-2]);
i = j;
free(tmp);
}
printf("\n");
return 0;
}
PS: You will probably have to do some modifications to it, but I hope its enough for what you want (or to get you started). Please, be aware, I am assuming each subset has size at least 2, and that the greatest value within a subset appears only once.
Upvotes: 2