Reputation: 7216
Assume I have the following code:
int x[200];
void thread1() {
for(int i = 0; i < 100; i++)
x[i*2] = 1;
}
void thread2() {
for(int i = 0; i < 100; i++)
x[i*2 + 1] = 1;
}
Is the code correct in x86-64 memory model (from what I understand it is) assuming the page was configured with default write cache policy in Linux? What is the impact on performance of such code (from what I understand - none)?
PS. As of performance - I am mostly interested in Sandy Bridge.
EDIT: As of expectation - I want to write to aligned locations from different threads. I expect the upper code after finishing and barrier to contains {1,1,1, ...}
in x
rather then {0,1,0,1,...}
or {1,0,1,0,...}
.
Upvotes: 1
Views: 328
Reputation: 7216
If I understand correctly the writes will eventually propagate by snooping requests . The Sandy Bridge uses Quick Path between cores so the snooping would not hit FSB but would use much quicker interconnection. As it is not based on cache-invalidation-on-write it should be 'fairly' quick although I wasn't able to find what is the overhead of conflict resolution (but probably lower then L3 write).
EDIT: According to Intel® 64 and IA-32 Architectures Optimization Reference Manual clean hit have impact of 43 cycles and dirty hit have impact of 60 cycles (compared with 4 cycles normal overhead for L1, 12 for L2 and 26-31 for L3).
Upvotes: 1