zrf
zrf

Reputation: 88

Python copy-on-write or copy-on-access shared memory

I'm trying to understand how sharing memory between processes works and I'm stuck.
I'm using a very simple test program c.py and tracking memory using smem

c.py:

import sys
import time
from multiprocessing import Process

arr = [x for x in range(int(1e6) * 50)]
print(sys.getsizeof(arr))  # 411943896

def f():
    x = 0
    for i in range(len(arr)):
        #x += arr[i]
        pass
    time.sleep(10)

p = Process(target=f)
p.start()
p.join()

When I run it with x += arr[i] commented out I see the following results:

PID User     Command                         Swap      USS      PSS      RSS
  1693779 1000     python /usr/bin/smem -n -t         0     8368     9103    14628
  1693763 1000     python c.py                        0     1248   992816  1986688
  1693749 1000     python c.py                        0     1244   993247  1989752
  -------------------------------------------------------------------------------
      3 1                                           0    10860  1995166  3991068

If I understand correctly PSS is telling me that my single global array arr is shared between two processes and USS shows very little unique memory allocated per process.

However when I uncomment x += arr[i] just accessing the array elements in child process yields very different results:

PID User     Command                         Swap      USS      PSS      RSS
  1695338 1000     python /usr/bin/smem -n -t         0     8476     9508    14392
  1695296 1000     python c.py                       64  1588472  1786582  1986708
  1695280 1000     python c.py                        0  1588644  1787246  1989520
  -------------------------------------------------------------------------------
      3 1                                          64  3185592  3583336  3990620

Which I don't understand. It seems that accessing the array caused it to be copied to the child process, meaning that python actually copies shared memory on access, not on write.

  1. Is my understanding correct? Has memory where arr data resides been copied to the child process when global variable arr was accessed?

  2. If so is there no way for the child process to access the global variables without doubling memory usage?

  3. I would love if someone could explain the overall memory usage smem reports, in this case, however, I expect it to be a question more suited for SU?. If simple copying took place I would expect the memory to double however each process shows unique memory of 1588472 and on top of that overall PSS shared memory is 2x 1786582 so it totals at about 6750108? I'm pretty sure my understanding here is very wrong but I don't know how to interpret it.

Upvotes: 3

Views: 1412

Answers (1)

user2357112
user2357112

Reputation: 281013

You are writing to the elements. The standard implementation of Python uses reference counting, so even looking at an object requires a write to its reference count.

Upvotes: 3

Related Questions