Peter Leontev
Peter Leontev

Reputation: 107

Segments sum algorithm

I am trying to solve the following task:

1) Given the array A of size N.
2) Given set of range update queries i.e. (L, R, val) that should do A[i] += val for L <= i <= R.
3) Given the set of range sum queries i.e. (L, R) that should return sum(A[i]) for L <= i <= R.

Constraints:

1) Size of A, segments and queries sets N, N1, N2 <= 2^24.
2) 0 <= L <= 2^24, 0 <= R <= 2^24, 0 <= val <= 2^24.

Problem is to calculate sum of all range sum queries (S) modulo 2^32.

It seems that one may implement Segment tree to get desired sum with O(NlogN) time but actually we don't need to use this data structure. Instead, we can somehow calculate S in O(N) time just using 2 or 3 arrays. What is the general idea here?

I has recently wrote some algorithm in C++ to this problem but that is not optimal. Pseudocode:

  1. Create two arrays Add[0..N-1] and Substract[0..N-1].
  2. Iterate over the set of range updates and do Add[L] += val and Substract[R] += val.
  3. Create array Partial_sum[0..N]
  4. Partial_sum[0] = 0, what_to_add = 0.
  5. For i in [1..N]:
    5.1. Partial_sum[i] = Partial_sum[i - 1] + Add[i - 1] + what_do_add
    5.2. what_do_add = what_to_add + Add[i - 1] - Substract[i - 1]

We get Partial_sum array and can easily calculate any segment sum (L, R) in O(1) time just like Partial_sum[R+1] - Partial_sum[L].

But, the problem is that step 2 is too slow. Also, the loop in step 5 is hard to undestand. That is O(n) solution but constant is too high. I know there should be the way to improve step 5 but I don't undestand how to do this.

Could someone give some ideas or even suggest their own algorithm to solve this problem?

Thank you.

My algorithm implementation:

#include <cstring>
#include <iostream>
#include <stdio.h>

typedef unsigned int UINT;
typedef unsigned long long ULL;


//MOD and size of A
const ULL MOD  = 4294967296LL; // 2^32
const size_t N = 16777216;     // 2^24

//params for next_rand()
UINT seed = 0;
UINT a;
UINT b;


//get random segment
UINT next_rand()
{
    seed = seed * a + b;
    return seed >> 8;
}


int main()
{
    UINT N1, N2;

    std::cin >> N1 >> N2;
    std::cin >> a >> b;

    UINT* add  = new UINT[N];         //Add array
    UINT* subs = new UINT[N];         //Substraction array
    UINT* part_sum = new UINT[N + 1]; //Partial sums array

    memset(add, 0, sizeof(UINT) * N);
    memset(subs, 0, sizeof(UINT) * N);
    memset(part_sum, 0, sizeof(UINT) * (N + 1));  //Initialize arrays

    //step 2 
    for (size_t i = 0; i < N1; ++i)
    {
        UINT val = next_rand();
        UINT l   = next_rand();
        UINT r   = next_rand();

        if (l > r)
        {
            std::swap(l, r);
        }

        add[l]  = (add[l] + val);
        subs[r] = (subs[r] + val);
    }

    part_sum[0]   = 0;
    UINT curr_add = 0;

    //step 5
    for (size_t i = 1; i <= N; ++i)
    {
        part_sum[i] = (part_sum[i - 1] + curr_add + add[i - 1]);

        curr_add = (curr_add + add[i - 1] - subs[i - 1]);
    }

    UINT res_sum = 0;

    //Get any segment sum in O(1)
    for (size_t i = 0; i < N2; ++i)
    {
        UINT l = next_rand();
        UINT r = next_rand();

        if (l > r)
        {
            std::swap(l, r);
        }
        res_sum = (res_sum + part_sum[r + 1] - part_sum[l]);
    }

    std::cout << res_sum;

    delete []add;
    delete []subs;
    delete []part_sum;

    return 0;
}

Upvotes: 1

Views: 1123

Answers (2)

Peter Leontev
Peter Leontev

Reputation: 107

I've implemented described algorithm in different way. It should work faster. It should work faster than before at maximum values of update and sum query sizes.

#include <iostream>
#include <stdio.h>
#include <vector>

typedef unsigned int UINT;
typedef unsigned long long ULL;

const ULL MOD  = 4294967296LL; // 2^32
const size_t N = 16777216;     // 2^24

UINT seed = 0;
UINT a;
UINT b;

UINT next_rand()
{
    seed = seed * a + b;

    return seed >> 8;
}

std::vector <std::pair<UINT, UINT> > add;

int main()
{
    UINT upd_query_count;
    UINT sum_query_count;

    // freopen("fastadd.in",  "r", stdin);
    // freopen("fastadd.out", "w", stdout);

    scanf("%u", &upd_query_count);
    scanf("%u", &sum_query_count);
    scanf("%u", &a);
    scanf("%u", &b);

    add.reserve(N+1);

    for (size_t i = 0; i < upd_query_count; ++i)
    {  
        UINT val = next_rand();
        UINT l   = next_rand();
        UINT r   = next_rand();

        if (l > r)
        {
            add[r].first     += val;
            add[l + 1].first -= val;
        }
        else
        {
            add[l].first     += val;
            add[r + 1].first -= val;
        }
    }

    for (size_t i = 0; i < sum_query_count; ++i)
    {
        UINT l = next_rand();
        UINT r = next_rand();

        if (l > r)
        {
            ++add[r].second;
            --add[l + 1].second;
        }
        else
        {
            ++add[l].second;
            --add[r + 1].second;
        }
    }

    UINT curr_add = 0;
    UINT res_sum  = 0;
    UINT times    = 0;

    for (size_t i = 0; i < N; ++i )
    {
        curr_add += add[i].first;
        times    += add[i].second;

        res_sum += curr_add * times;
    }

    printf("%u\n", res_sum);

    return 0;
}

Upvotes: 1

Jonathan Mee
Jonathan Mee

Reputation: 38919

So add ad subs are very large arrays.

The first place you should look for a speed up here is in memory access. As N1 becomes large you will end up with a tremendous number of cache misses. This is probably somewhat beyond the scope to explain so I'll link: http://en.wikipedia.org/wiki/CPU_cache

As far as a way you can speed this up. Lets try to improve spatial equality by ordering our access.

std::vector<std::pair<UINT, UINT>> l{N1};
std::vector<std::pair<UINT, UINT>> r{N1};

for(size_t i = 0; i < N1; ++i){
    const UINT val = next_rand();
    const UINT first = next_rand();
    const UINT second = next_rand();

    if(first > second){
        l[i] = std::make_pair(second, val);
        r[i] = std::make_pair(first, val);
    }else{
        l[i] = std::make_pair(first, val);
        r[i] = std::make_pair(second, val);
    }
}
std::sort(l.begin(), l.end());
std::sort(r.begin(), r.end());

for(size_t i = 0; i < N1; ++i){
    add[l[i].first] += l[i].second;
    subs[r[i].first] += r[i].second;
}

Keep a couple things in mind, std::pair's operator< compares the first element and if those are equal compares the second. That's how I'm able to use std::sort without writing any more code. However if first is equal for two elemnts, the highest val will always be the second one added. It doesn't seem like that would be a problem in your current code, but if it becomes one you can solve it by writing your own sort loop, rather than relying on std::sort.

Also depending on how sparse access is to each cache block it may be faster to do your additions in separate loops.

As always the only way you can really improve performance is when working with actual numbers so be sure to do your own bench marking as your're comparing methods.

Upvotes: 0

Related Questions