Travis Su
Travis Su

Reputation: 709

for loop went to super long time (20+ Minutes) iteration when the input has large amount of data

I have two functions here:

int getHighestVal(int n, vector<double> arr) {
    int highest = 0;
    for (int i = 0; i < n; i++) {
        if (arr[i] > arr[highest])
            highest = i;
    }

    return highest;
}
vector<int> getRank(int n, vector<double> arr) {
    vector<int> rank(n);
    vector<bool> used(n);
    for (int i = 0; i < n; i++)
        used[i] = false;
    int lowestVal = getHighestVal(n, arr);
    cout << "Pass waypoint lowestVal" << endl;

    for (int i = 1; i <= n; i++) { //LOOP HERE WENT INFINITE ITERATION
        for (int j = 0; j < n; j++) {
            if (used[j] == false && arr[lowestVal] > arr[j])
                lowestVal = j;
        }

        rank[lowestVal] = i;
        used[lowestVal] = true;
        lowestVal = getHighestVal(n, arr);
        cout << "\rPass waypoint RANKING Loop2: " << n;
    }
    cout << "Pass waypoint RANKING" << endl;

    return rank;
}

I was using it to implement my program, but the for loop in getRank will acting fussy (spended nearly 20Mins to finish) when I tried to input a vector<double>arr that contains 16200 doubles.

Why? That was too long for 16200 doubles.

Note: With @bruno 's solution, runnning it on Release mode can shorten the time from 1.5 Sec to 0.3 Sec. Huge Improvement.

Upvotes: 1

Views: 174

Answers (3)

bruno
bruno

Reputation: 32596

Because arr is unchanged the value return by getHighestVal is always the same, so it is needed to call that function only one time, rather to do in the loop for

To use const reference makes the code more performant but also more clear because immediately indicates arr is unchanged without having to look inside the bodies

So you will save time (like divide by 5) with just little changes :

int getHighestVal(int n, const vector<double> & arr) {
    int highest = 0;
    for (int i = 1; i < n; i++) {
        if (arr[i] > arr[highest])
            highest = i;
    }

    return highest;
}

vector<int> getRank(int n, const vector<double> & arr) {
    vector<int> rank(n);
    vector<bool> used(n, false);
    int lowestVal = getHighestVal(n, arr);
    cout << "Pass waypoint lowestVal" << endl;

    for (int i = 1; i <= n; i++) { //LOOP HERE WENT INFINITE ITERATION
        int lo = lowestVal;
        for (int j = 0; j < n; j++) {
            if (used[j] == false && arr[lo] > arr[j])
                lo = j;
        }

        rank[lo] = i;
        used[lo] = true;
        //cout << "\rPass waypoint RANKING Loop2: " << n;
    }
    cout << "Pass waypoint RANKING" << endl;

    return rank;
}

The parameter n has sense only if not all the vector have to be considered (n < vector size)

Upvotes: 1

Aconcagua
Aconcagua

Reputation: 25536

Assuming you want to always create the rank for the entire array, then the first parameter n is redundant – you can get the same information from arr.size(). Redundancies can be sources of error, so in this case rather drop the parameter:

std::vector<size_t> getRank(std::vector<double> const& arr);

Two other changes:

  • It looks pretty much as if rank won't ever get negative; then an unsigned type is better choice. size_t is suitable to hold any number of elements you can pack into a std::vector, so it's a fine type. Only if this consumes too much memory, I'd fall back to a smaller type...
  • Accepting by const reference avoids copying the vector. You are not going to modify it anyway, so there's no need to create copies. That gets especially relevant for your getHighestVal-function, which gets called again and again.

However, there's no need to re-invent the wheel, there's already std::max_element that does the same...

std::vector<size_t> getRank(std::vector<double> const& arr)
{
    vector<size_t> rank(arr.size());
    vector<bool> used(arr.size(), false);
    // Noticed second argument? It makes the subsequent loop obsolete...
    //for (int i = 0; i < n; i++)
    //    used[i] = false;

    // using std:max_element instead
    auto lowestVal = std::max_element(arr.begin(), arr.end()) - arr.begin();
    // std::max_element returns an iterator, though – for getting an index,
    // we need to calculate the difference to first element

    std::cout << "Pass waypoint lowestVal" << std::endl;

    // now avoid calling std::max_element again and again!
    auto lowestValConst = lowestVal;

    for (size_t i = 1; i <= arr.size(); i++)
    {
        for (size_t j = 0; j < arr.size(); j++)
        {
            if (!used[j] && arr[lowestVal] > arr[j])
                lowestVal = j;
        }

        rank[lowestVal] = i;
        used[lowestVal] = true;

        // avoid re-calculation:
        lowestVal = lowestValConst; //getHighestVal(n, arr);

        std::cout << "\rPass waypoint RANKING Loop2: " << arr.size();
    }
    std::cout << "Pass waypoint RANKING" << std::endl;
}

This still remains an O(n²) algorithm, though. You can better, though, to O(n*log(n)):

std::vector<size_t> getRank(std::vector<double> const& arr)
{
    std::vector<std::pair<double, size_t>> values;
    values.reserve(arr.size()); // avoid re-allocations
    size_t index = 0;
    for(auto d : arr)
        values.emplace_back(d, index++);

    // copying the array into a second one with indices paired: O(n)

    std::sort
    (
        values.begin(), values.end(),
        std::greater<std::pair<double, size_t>>
    );
    // std::pair has already a lexicographical operator<, so we can use that one
    // – but because of lexicographical comparison it is important to use the
    // double value as first element; the index as second element then, as a
    // bonus assures stable sorting...
    // still we want to get descending order, so we need to compare with
    // greater instead of default of less

    // sorting has complexity of O(n*log(n))

    // we need to copy the indices into the ranks:
    std::vector<size_t> rank(arr.size());
    index = 0;
    for(auto& v : values)
        //ranks[v.second] = index++;
        // pre-increment: you seem to need 1-based rank...
        ranks[v.second] = ++index;

    // copying back: O(n)
}

Total now is O(n) + O(n*log(n) + O(n), which is O(n*log(n)) in total.

Be aware that above is untested code – if you encounter a bug, please fix it yourself...

Upvotes: 3

Ravi Kumar
Ravi Kumar

Reputation: 1

I think for loop should be less than n in for (int i = 1; i <= n; i++). Also, pass the vector with an address in functions instead of a copy.

Upvotes: 0

Related Questions