Reputation: 1798
I have a 2 dimensional array with fixed number of columns=4, but the number of rows is very large. I found out using vector<tuple>
instead of vector<array>
or was nearly 2x faster, and 5x faster than vector<vector>
. Though using a tuple
is quite a headache. I defined a access function to get around it:
typedef tuple<int,int,int,int> Tuple;
// access the i-th element
inline int &tget (Tuple &t, int i) {
switch (i) {
case 0: return std::get<0>(t);
case 1: return std::get<1>(t);
case 2: return std::get<2>(t);
case 3: return std::get<3>(t);
}
}
vector<Tuple> array;
// this gives the element in i-th row and j-th column
tget(array[i],j);
I found this quite surprising and was wondering where this performance boost coming from, and if there is a neat solution with a syntax similar to vector<vector>
or vector<array>
using subscript operator[]
with the same performance as tuple
?
UPDATE:
I'm using gcc4.8
with c++11
standards and I use -O3
for optimization.
Because people asked about the operations The operations are quite basic. First the 2d array is filled up with up to 1M rows. First I reserve the space and then use push_back
function to add elements. Then I sort the array, and finally I do a binary search on it.
This is the code block I wrote. The input will be several test cases (=T) and each test case takes an integer (N up to 20), and then reads N more integers in lens
. Then because of the exponential nature of the function the sums get quite big (4^(N/2) up to roughly a million). To test between array and tuple, toggle the typedef and comment/uncomment the return line in the access function. There are four lines/blocks that have to be commented/uncommented to switch back and forth between array
and tuple
, and the lines are designated in the comments (// if array
// if tuple
comments) :
#include <iostream>
#include <algorithm>
#include <vector>
#include <tuple>
#include <array>
using namespace std;
typedef vector<int>::iterator VecIt;
// HERE you can switch between the two types
typedef tuple<int,int,int,int> Tuple; // define it as a tuple
//typedef array<int,4> Tuple; // define it as array
inline int &tget (Tuple &t, int i) {
// return t[i]; // if it's an array, uncomment This too
// if it's an array comment the switch case
switch (i) {
case 0: return std::get<0>(t);
case 1: return std::get<1>(t);
case 2: return std::get<2>(t);
case 3: return std::get<3>(t);
default:
cerr << "tuple index out of bound" << endl;
}
}
void comp_sums(VecIt v, VecIt vend, vector<Tuple> &sums){
if (v==vend)
return;
int size = sums.size();
for (int i=0; i<size; i++) {
for (int j=1; j<4; j++){
sums.push_back(sums[i]);
tget(sums.back(),j) += *v;
}
tget(sums[i],0) += *v;
}
v++;
comp_sums(v, vend, sums);
}
void docase( ) {
int N;
cin >> N;
vector<int> lens(N);
int lsum = 0;
for (int i=0; i<N; i++) {
cin >> lens[i];
lsum += lens[i];
}
if (lsum % 4 != 0 ) {
cout << 0 << endl;
return;
}
lsum = lsum/4;
vector<Tuple> sums1, sums2;
Tuple z(0,0,0,0); // if it's a tuple
//Tuple z = {0,0,0,0}; // If it's an array
sums1.push_back(z); sums1.reserve(1<<N/2);
sums2.push_back(z); sums2.reserve(1<<N/2);
comp_sums(lens.begin()+1, lens.begin()+N/2, sums1);
comp_sums(lens.begin()+N/2+1, lens.end(), sums2);
sort(sums1.begin(), sums1.end());
sort(sums2.begin(), sums2.end());
long count = 0;
for (int i=0; i<sums1.size(); i++) {
for (int k=0; k<4; k++) {
//Tuple t({lsum,lsum,lsum,lsum}); // if it's an array
Tuple t(lsum,lsum,lsum,lsum); // if it's a tuple
for (int j=0; j<4; j++)
tget(t,j) -= tget(sums1[i],(j+k)%4);
tget(t,k) -= lens[0];
tget(t,0) -= lens[N/2];
bool neg = false;
for (int j=0; j<4; j++)
if (tget(t,j)<0){
neg = true;
break;
}
if (neg)
continue;
auto LB = lower_bound(sums2.begin(), sums2.end(), t);
auto UB = upper_bound(LB, sums2.end(), t);
count += (UB - LB);
}
}
cout << count/6 << endl;
}
int main() {
std::ios_base::sync_with_stdio(false);
int T;
cin >> T;
while (T--) docase();
}
Here is also a test input I use. The first line says there are T=18 separate test cases. First line for each test case is N (up to 20), and then N integers follow. Here it is. The algorithm has exponential growth so the small numbers don't mean it's only a few operations. Also in measuring time I have accounted for the I/O:
18
8
132391 123854 21536 19482 133025 10945 121800 10311
8
12 4 12 4 4 12 12 24
8
131723 16253 23309 132227 125171 12253 16757 136227
8
8 4 4 8 4 8 12 24
8
12 12 12 8 4 8 12 28
8
115021 114654 112093 17443 20371 17810 17274 115190
8
8 8 4 4 12 4 12 20
8
12 4 4 4 4 12 12 28
8
8 12 8 12 8 4 12 28
8
4 12 8 12 8 8 4 24
20
34836 56869 24112 27796 198091 196472 50364 27364 29572 53701 52981 25779 202644 204884 53840 30056 48705 53092 43751 18419
20
207447 26867 41968 35977 36384 57042 40981 30191 47705 26907 248361 26003 53715 48237 27691 192772 60507 58194 20754 245913
20
34836 56869 24112 27796 198091 196472 50364 27364 29572 53701 52981 25779 202644 204884 53840 30056 48705 53092 43751 18419
20
207447 26867 41968 35977 36384 57042 40981 30191 47705 26907 248361 26003 53715 48237 27691 192772 60507 58194 20754 245913
20
4 2 2 4 4 2 4 2 4 2 2 4 4 3 3 4 2 3 4 1
20
2 3 2 2 3 2 3 4 3 2 4 4 4 3 2 3 4 4 3 3
20
4 4 4 4 3 3 3 2 2 4 2 4 2 2 4 2 2 2 2 1
20
4 4 2 3 4 3 4 3 2 4 4 2 2 4 3 4 3 3 2 4
Upvotes: 4
Views: 3614
Reputation: 37616
My guess is that the bottleneck is operator<
of std::tuple
vs. std::array
. If you try with this user defined structure:
struct my_struct {
int a, b, c, d;
};
inline bool operator<(my_struct const& lhs, my_struct const& rhs) {
return std::tie(lhs.a, lhs.b, lhs.c, lhs.d) <
std::tie(rhs.a, rhs.b, rhs.c, rhs.d);
}
This is about 5 times slower than std::tuple
or std::array
(tested on both my computer with MinGW and Wandbox).
Furthermore, the operator<
assembly for std::tuple
and std::array
are different (with -O3
), which may account for the performance difference between std::tuple
and std::array
.
fun(std::array<int, 4ul> const&, std::array<int, 4ul> const&):
mov ecx, DWORD PTR [rdi]
cmp DWORD PTR [rsi], ecx
lea rdx, [rsi+16]
lea rax, [rdi+16]
jg .L32
jl .L31
.L25:
add rdi, 4
add rsi, 4
cmp rax, rdi
je .L36
mov ecx, DWORD PTR [rsi]
cmp DWORD PTR [rdi], ecx
jl .L32
jle .L25
.L31:
xor eax, eax
ret
.L32:
mov eax, 1
ret
.L36:
cmp rdx, rsi
setne al
ret
fun(std::tuple<int, int, int, int> const&, std::tuple<int, int, int, int> const&):
mov edx, DWORD PTR [rsi+12]
cmp DWORD PTR [rdi+12], edx
mov eax, 1
jl .L11
mov eax, 0
jg .L11
mov ecx, DWORD PTR [rsi+8]
cmp DWORD PTR [rdi+8], ecx
mov eax, 1
jl .L11
mov eax, 0
jg .L11
mov ecx, DWORD PTR [rsi+4]
cmp DWORD PTR [rdi+4], ecx
mov eax, 1
jl .L11
mov eax, 0
jg .L11
mov eax, DWORD PTR [rsi]
cmp DWORD PTR [rdi], eax
setl al
.L11:
rep ret
Upvotes: 2
Reputation: 5222
Another question is similar to your question : Is it fine to replace an array with hierarchal structures and classes?
It has code and benchmark.
What is basically happening is that the tuple creates a structure which the compiler is able to better optimize operations on, than on an array of elements.
Upvotes: 1