Reputation: 1263
I have to process each element of an array the same way once and have to modify each by an unpredictable pattern afterwards aswell.
Is there any difference in performance between these snippets and why, if so?
std::vector<int> nums;
//fill nums array
for(unsigned int i = 0; i < nums.size(); ++i){
nums[i] *= nums[i];
if(nums[i] < 10){
nums[i] = 0;
}
}
std::vector<int> nums;
//fill nums array
for(unsigned int i = 0; i < nums.size(); ++i){
nums[i] *= nums[i];
}
for(unsigned int i = 0; i < nums.size(); ++i){
if(nums[i] < 10){
nums[i] = 0;
}
}
Would a different approach like this improve anything?
std::vector<int> nums;
//fill nums array
std::vector<int> flags;
flags.resize(nums.size());
for(unsigned int i = 0; i < nums.size(); ++i){
nums[i] *= nums[i];
flags[i] = nums[i] < 10;
}
for(unsigned int i = 0; i < flags.size(); ++i){
nums[i] = (!flags[i]) * nums[i]; //!flags[i] is 0 if nums[i] < 10
}
Upvotes: 3
Views: 693
Reputation: 69882
Test results on my machine (100 million ints in the array):
checkpoint
starting test: approach1
test took: 84ms
starting test: approach2
test took: 190ms
starting test: approach3
test took: 529ms
starting test: Bathsheba's idea
test took: 61ms
Incidentally, writing idiomatic code is often the most efficient of all. clang, gcc et. al. optimisers are fantastic:
void approach5(std::vector<int> &nums) {
auto filter = [](auto x) { return (x < 10) ? 0 : x; };
auto grow = [filter](auto x) { return filter(x * x); };
std::transform(begin(nums), end(nums), begin(nums), grow);
}
Here's the code.
#include <iostream>
#include <chrono>
#include <random>
#include <array>
#include <vector>
#include <algorithm>
auto constexpr test_size = std::size_t(100'000'000);
using namespace std::literals;
void approach1(std::vector<int> &nums) {
for (unsigned int i = 0; i < nums.size(); ++i) {
nums[i] *= nums[i];
if (nums[i] < 10) {
nums[i] = 0;
}
}
}
void approach2(std::vector<int> &nums) {
for (unsigned int i = 0; i < nums.size(); ++i) {
nums[i] *= nums[i];
}
for (unsigned int i = 0; i < nums.size(); ++i) {
if (nums[i] < 10) {
nums[i] = 0;
}
}
}
void approach3(std::vector<int> &nums) {
std::vector<int> flags;
flags.resize(nums.size());
for (unsigned int i = 0; i < nums.size(); ++i) {
nums[i] *= nums[i];
flags[i] = nums[i] < 10;
}
for (unsigned int i = 0; i < flags.size(); ++i) {
nums[i] = (!flags[i]) * nums[i]; //!flags[i] is 0 if nums[i] < 10
}
}
void approach4(std::vector<int> &nums) {
for (std::size_t i = 0; i < nums.size(); ++i) {
nums[i] *= (nums[i] <= 3 ? 0 : nums[i]);
}
}
auto test = [](auto &&name, auto &&approach, auto &&data) {
std::cout << "starting test: " << name << std::endl;
auto my_data = std::vector<int>(data.begin(), data.end());
auto now = std::chrono::high_resolution_clock::now();
approach(my_data);
auto then = std::chrono::high_resolution_clock::now();
auto diff = std::chrono::duration_cast<std::chrono::milliseconds>(then - now);
std::cout << "test took: " << diff.count() << "ms" << std::endl;
};
std::array<int, test_size> test_data;
int main() {
std::random_device rd;
std::default_random_engine eng(rd());
std::uniform_int_distribution<> dist(0, 100);
std::generate(test_data.begin(), test_data.end(), [&]() { return dist(eng); });
std::cout << "checkpoint" << std::endl;
test("approach1", approach1, test_data);
test("approach2", approach2, test_data);
test("approach3", approach3, test_data);
test("Bathsheba's idea", approach4, test_data);
}
Upvotes: 3
Reputation: 19
In Algorithm analysis First pattern: time complexity is size of array O(n) Second pattern: time complexity is max between 2 loops O(n) and O(n) so it will be O(n) also.
First and second patterns are a same.
Third pattern not improve any thing.
Conclusion: 3 patterns are techniques not more.
Upvotes: 0
Reputation: 234715
The first one has fewer statements (which is always a good rough guide: fewer similar statements often means better performance), but above everything else, it is far more readable.
Don't micro-optimise: leave that to the compiler.
If you are in any doubt check the generated assembly / profile the performance.
Finally (apologise, for I couldn't resist), if nums
was a std::vector<unsigned>
, then you could have written
for (std::size_t i = 0; i < nums.size(); ++i){
nums[i] *= (nums[i] <= 3 ? 0 : nums[i]);
}
which could help the branch predictor.
Upvotes: 4