Reputation: 5693

How to insert Memory Fence and specify that memory is volatile in the Python program?

I am using Python languages and I use CPU threads from the threading thread.Threading wrapper. In some way, the Python interpreter converts my code into PYC byte code with its JIT. (Please provide a reference to Python bytecode standard, but as far as I know standard does not exist. As well it does not exist a standard for a language).

Then these virtual commands are executed. The real commands for Intel'd CPUs are x86/x64 instructions, and for ARM's CPU are AArch64/AArch32 instructions.

My problem - I want to make an action within the Python programming language, that enforces an ordering constraint between the memory operations.

What I want to know:

Q1: How I can emit a command

mfence if Python program is running in x86/x64 CPU
Or instruction like atomic_thread_fence() for LLVM-IR

Q2: How I can specify that some memory is volatile and should not be put into the CPU register for optimization purposes.

Upvotes: 2

Answers (2)

Konstantin Burlachenko

Reputation: 5693

About Python Threads:

The Python thread first of all is tricky. Interpreters use real POSIX/WinAPI threads. Threads thread.Threading under the hood is the real threads.

The thread execution model is pretty specific and can be called "cooperative multitasking" due to one enthusiast (David Beazley, https://www.dabeaz.com/about.html)

As David Beazley stated https://www.dabeaz.com/python/UnderstandingGIL.pdf

When a thread is in the I/O waiting for the Thread release global lock (called GIL lock). David Beazley stated that there is a way to release the lock after the system call.
Next, there is a "tick" instruction in Python VM instructions. If some thread is CPU-bound then the thread will execute that "tick" instruction. (Roughly speaking it occurs every 100ms)
In tick, each thread tries to release GIL and acquire "tick" one more time.
There is no thread scheduler in Python
Multithreading in Python is in fact hurts performance.

Upvotes: 0

Keiji

Reputation: 1042

CPython does not have a JIT - though it may do one day.

So your code is only converted into bytecode, which will be interpreted, and not into actual Intel/etc. machine code.

Additionally, Python has what's known as the GIL - Global Interpreter Lock - meaning that even if you have multiple Python threads in a process, only one of them can be interpreting code at once - though this may also change one day. Threads were frequently useful for doing I/O, because I/O operations are able to happen at the same time, but these days asyncio is a good competitor for doing that.

So, in response to Q1, it doesn't make any sense to "put an mfence in Python code" (or the like).

Instead, what you probably want to do, if you want to enforce ordering constraints between one bit of code being executed and another, is use more high-level strategies, such as Python threading.Lock, queue.Queue, or similar equivalents in the asyncio world.

As for Q2, are you thinking of the C/C++ keyword volatile? This is frequently mistakenly thought of as a way to make memory access atomic for use in multithreading - it isn't. All that C/C++ volatile does is ensure that memory reads and writes happen exactly as specified rather than being possibly optimised out. What is your use case? There are all sorts of strategies one can use in Python to optimise code, but it's an impossible question to answer without knowing what you're actually trying to do.

Answers to comments

The CPU executes instructions. Somebody should emit this instruction. I'm calling a JIT a part inside a Python interpreter that emits instructions at the end of the day.

CPython is an interpreter - it does not emit instructions. JITs do emit instructions, but as stated above, CPython does not have a JIT. When you run a Python script, CPython will compile your text-based .py file into bytecode, and then it will spend the rest of its time working through the bytecode and doing what the bytecode says. The only machine instructions being executed are those that are emitted by whoever compiled CPython.

If you compile a Python script to a .pyc and then execute that, CPython will do exactly the same, it will just skip the "compile your text-based .py file into bytecode" part as it's already done it - the result of that step is stored in the .pyc file.

I was a bit vague in naming. Do you mean in Python, the instruction is reexecuted each time the interpreter meats the instruction?

A real CPU executing machine code will "re-execute" each instruction as it reads it, sure. CPython will do the same thing with Python bytecode. (CPython will execute a whole bunch of real machine code instructions each time it reads one Python instruction.)

Thanks, I have found this notice https://docs.python.org/3/extending/extending.html "CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once ". Ok, so when Python thread will go with C++/C bindings into native code then what will happen? Q1-A - can in that time Python another thread be executed? Q1-B - If inside C++ code I will create another thread what will happen?

Native code can release the GIL but must lock it again before returning to Python code.

Typically, native code that does some CPU-intensive work or does some I/O that requires waiting would release the GIL while it does that work or waits on that I/O. This allows Python code on another Python thread to run at the same time. But at no point does Python code run on two threads at once. That's why it makes no sense to put native ordering instructions and the like in Python code.

Native code that needs to use native ordering instructions for its own purposes will obviously do that, but that is C/C++ code, not Python code. If you want to know how to put native ordering instructions in a C/C++ Python extension, then look up how to do it in any C/C++ code - the fact that it's a Python extension is irrelevant.

Basically, either write Python code and use high-level strategies like what I mentioned above, or write a native extension in C/C++ and do whatever you need to do in that language.

I need to learn more about GIL and seems that there is a good study of GIL from David Beazley https://dabeaz.com/python/UnderstandingGIL.pdf But @Keiji - you maybe can be wrong with Q1 - CPython Threads seems to be a real thread, and if implementor of C++/C extensions (Almost all libraries for Python) will decide to release GIL lock - it's possible to do... So Q1 still has sense...

I've covered this above. Python code can't interact with native code in a way that would require putting native ordering instructions in Python code.

Back to the question - I mean volatile in sense of C++ getting rid of compiler optimization to not allow optimized variables to be put into the register. In C++ it does not guarantee atomicity and it does not guarantee memory fence. So question regarding volatile how I can specify for integer variable or user-defined type?

If you want to make something in C/C++ be volatile, use the volatile keyword. If you're writing Python code, it doesn't make any sense to make something volatile.

Upvotes: 1

How to insert Memory Fence and specify that memory is volatile in the Python program?

Answers (2)

Answers to comments

Related Questions