Reputation: 352
This should be easy. After looking at what's going on, I'm not so sure. Is the writing/reading of a single binary integer atomic? That's how the underlying hardware does reads and writes of 32-bit integers. After some research, I realized Python does not store integers as a collection of bytes. It doesn't even store bytes as a collection of bytes. There's overhead involved. Does this overhead break the atomic nature of binary integers?
Here is some code I used trying to figure this out:
import time
import sys
tm=time.time()
int_tm = int(tm * 1000000)
bin_tm = bin(int_tm)
int_bin_tm = int(bin_tm, 2)
print('tm:', tm, ", Size:", sys.getsizeof(tm))
print('int_tm:', int_tm, ", Size:", sys.getsizeof(int_tm))
print('bin_tm:', bin_tm, ", Size:", sys.getsizeof(bin_tm))
print('int_bin_tm:', int_bin_tm, ", Size:", sys.getsizeof(int_bin_tm))
Output:
tm: 1581435513.076924 , Size: 24
int_tm: 1581435513076924 , Size: 32
bin_tm: 0b101100111100100111010100101111111011111110010111100 , Size: 102
int_bin_tm: 1581435513076924 , Size: 32
For a couple side questions, does Python's binary representation of integers really consume so much more memory? Am I using the wrong type for converting decimal integers to bytes?
Upvotes: 2
Views: 196
Reputation: 11075
Python doesn't guarantee any atomic operations other than specific mutex constructs like locks and semaphores. Some operations will seem to be atomic because the GIL will prevent bytecode from being run on multiple python threads at once "This lock is necessary mainly because CPython's memory management is not thread-safe".
Basically what this means is that python will ensure an entire bytecode instruction is completely evaluated before allowing another thread to continue. This does not mean however an entire line of code is guaranteed to complete without interruption. This is especially true with function calls. For a deeper look at this, take a look at the dis
module.
I will also point out that this talk about atomicity means nothing at a hardware level, the whole idea of an interpreted language is to abstract out the hardware. If you want to consider "actual" hardware atomicity, it will generally be a function provided by the operating system (which is how python likely implements things like threading.Lock
).
Note on data sizes (this is just a quickie, because this is a whole other question):
sizeof(tm)
: 8 bytes for the 64 bit float, 8 bytes for a pointer to the data type, and 8 bytes for a pointer to the reference count
sizeof(int_tm)
: ints are a bit more complicated as some small values are "cached" using a smaller format for efficiency, then large values use a more flexible type where the number of bytes used to store the int can expand to however big it needs to be.
sizeof(bin_tm)
: This is actually a string, which is why it takes so much more memory than just a number, there is a fairly significant overhead, plus at least one byte per character.
"Am I using the wrong type for converting..?" We need to know what you're trying to do with the result to answer this.
Upvotes: 3