MenNotAtWork
MenNotAtWork

Reputation: 165

Nasm - Change variable in different threads

I have a program that has a main thread and a second thread. The second thread modifies a global variable which then will be used in the main thread. But somehow the changes I make in the second thread are not shown in the main thread.

section .bss USE32
  global var
  var resd 1

section .text USE32
  ..start:
  push 0
  push 0
  push 0
  push .second
  push 0
  push 0
  call [CreateThread]
  mov eax, 1
  cmp [var], eax ; --> the content of var and '1' are not the same. Which is confusing since I set the content of var to '1' in the second thread
  ;the other code here is not important

.second:
  mov eax, 1
  mov [var], eax
  ret

(This is a simplification of my real program which creates threads in a loop; I haven't tested this exact code.)

Upvotes: 1

Views: 192

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 365157

You don't join the new thread (wait for it to exit); there's no reason to assume that it's finished (or even fully started) when CreateThread returns to the main thread.

You could spin-wait until you see a non-zero value in [var], and count how many iterations that takes, if you want to benchmark thread-startup overhead + inter-core latency.

   ...
   call  [CreateThread]
   mov   edi, 1
   cmp   [var], edi
   je   .zero_latency    ; if var already changed

   rdtsc                 ; could put an lfence before and/or after this to serialize execution
   mov  ecx, eax         ; save low half of EDX:EAX cycle count; should be short enough that the interval fits in 32 bits
   xor  esi, esi
  .spin:
   inc  esi            ; ++spin_count
   pause               ; optional, but avoids memory-order mis-speculation when var changes
   cmp  [var], edi
   jne .spin

   rdtsc
   sub  eax, ecx        ; reference cycles since CreateThread returned
   ...
 .zero_latency:         ; jump here if the value already changed before the first iteration

Note that rdtsc measures in reference cycles, not core clock cycles, so turbo matters. Only doing the low 32 bits of the 64-bit subtraction is fine if the interval is less than 2^32 (e.g. about 1 second on a CPU with a reference frequency of 4.2 GHz, vastly longer than we'd expect here).

esi is the spin count. With pause in the loop, you'll do about one check per 100 cycles on Skylake and later, or about one check per 5 cycles on earlier Intel. Otherwise about one check per core clock cycle.

Upvotes: 3

Related Questions