Harry de winton
Harry de winton

Reputation: 1069

Storing strings in a multiprocessing sharedctypes Array

I am trying to share some strings between processes using the sharedctypes part of the multiprocessing module.

TL;DR: I wish to put my strings into a sharedctypes array, like so:

from multiprocessing.sharedctypes import Array

Array(ctypes.c_char, ['a string', 'another string'])

More Information:

The docs have this note:

"Note that an array of ctypes.c_char has value and raw attributes which allow one to use it to store and retrieve strings."

Using c_char alone:

from multiprocessing.sharedctypes import Array

Array(ctypes.c_char, ['a string', 'another string'])

I get a type error, which makes sense:

TypeError: one character bytes, bytearray or integer expected

This can (kind of) work by splittingthe sting in to bytes (which makes also sense):

from multiprocessing.sharedctypes import Array

multiproccessing.sharedctypes.Array(ctypes.c_char, [b's', b't', b'r', b'i', b'n', b'g'])

But this is not very convenient for storing large lists of strings.

However when I tried using the value and raw attributes shown in the docs here and mentioned in that note there is still no magic:

Array(ctypes.c_char.value, ['string'])

gives this error:

TypeError: unsupported operand type(s) for *: 'getset_descriptor' and 'int'

and raw gives this:

Array(ctypes.c_char.raw, ['string'])

AttributeError: type object 'c_char' has no attribute 'raw'

I have also tried using the c_wchar_p type which in the table of primitive C compatible data types (found in the docs) corresponds directly to a string:

 Array(ctypes.c_wchar_p, ['string'])

This CRASHES python, no error code is reported, the process simply exits with code 0.

Why can't sharedctypes arrays hold pointers like the c_wchar_p type? any other solution or advice on how to store strings in a sharedctype arrays is much welcome!

Update - This code occasionally works (most of the time python stops working but occasionally I get strings back, although they are mostly gibberish). but the comments mention it working fine on windows.

from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Value, Array
import ctypes


def print_strings(S):
    """Print strings in the C array"""
    print([a for a in S])

if __name__ == '__main__':
    lock = Lock()
    string_array = Array(ctypes.c_wchar_p, ['string'])
    q = Process(target=print_strings, args=(string_array,))
    q.start()
    q.join()

Update 2

This is the gibberish I get:

['汣獵癩汥⁹景椠瑮搠祴数\u2e73ਊ††敓\u2065汁潳\u200a†ⴠⴭⴭⴭਭ††捳灩\u2e79灳捥慩\u2e6c癩\u202c捳灩\u2e79灳捥慩\u2e6c癩\u0a65\u200a†丠瑯獥\u200a†ⴠⴭⴭ\u200a†圠\u2065獵\u2065桴\u2065污潧楲桴\u206d異汢獩敨\u2064祢䌠敬獮慨⁷ㅛ彝愠摮爠晥牥湥散\u2064祢\u200a†䄠牢浡睯瑩⁺湡\u2064瑓来湵嬠崲\u2c5f映牯眠楨档琠敨映湵瑣潩\u206e潤慭湩椠ੳ††慰瑲瑩潩敮\u2064湩潴琠敨琠潷椠瑮牥慶獬嬠ⰰ崸愠摮⠠ⰸ湩⥦\u202c湡\u2064桃扥獹敨\u0a76††潰祬潮業污攠灸湡楳湯\u2073牡\u2065浥汰祯摥椠\u206e慥档椠瑮牥慶\u2e6c删汥瑡癩\u2065牥潲\u2072湯\u200a†琠敨搠浯楡\u206eせ㌬崰甠楳杮䤠䕅⁅牡瑩浨瑥捩椠\u2073潤畣敭瑮摥嬠崳\u205f獡栠癡湩\u2067\u0a61††数歡漠\u2066⸵攸ㄭ‶楷桴愠\u206e浲\u2073景ㄠ㐮ⵥ㘱⠠\u206e‽〳〰⤰ਮ\u200a†删晥牥湥散ੳ††ⴭⴭⴭⴭⴭ\u200a†⸠\u202eㅛ⁝\u2e43圠\u202e汃湥桳睡\u202c䌢敨祢桳癥猠牥敩\u2073潦\u2072慭桴浥瑡捩污映湵瑣潩獮Ⱒ椠੮†††††⨠慎楴湯污倠票楳慣\u206c慌潢慲潴祲䴠瑡敨慭楴慣\u206c慔汢獥Ⱚ瘠汯\u202eⰵ䰠湯潤㩮\u200a†††††效\u2072慍敪瑳❹\u2073瑓瑡潩敮祲传晦捩ⱥㄠ㘹⸲\u200a†⸠\u202e㉛⁝\u2e4d䄠牢浡睯瑩⁺湡\u2064\u2e49䄠\u202e瑓来湵\u202c䠪湡扤潯\u206b景䴠瑡敨慭楴慣੬†††††䘠湵瑣潩獮Ⱚㄠ琰\u2068牰湩楴杮\u202c敎⁷潙歲›潄敶Ⱳㄠ㘹ⰴ瀠\u2e70㌠㤷ਮ†††††栠瑴㩰⼯睷\u2e77慭桴献畦挮⽡捾浢愯湡獤瀯条彥㜳⸹瑨੭††⸮嬠崳栠瑴㩰⼯潫敢敳牡档挮慰\u2e6e牯⽧瑨潤獣䴯瑡\u2d68敃桰獥䴯瑡⽨敃桰獥栮浴੬\u200a†䔠慸灭敬ੳ††ⴭⴭⴭⴭ\u200a†㸠㸾渠\u2e70ど嬨⸰⥝\u200a†愠牲祡ㄨ〮\u0a29††㸾‾灮椮⠰せⰮㄠ\u202e\u202b樲⥝\u200a†愠牲祡嬨ㄠ〮〰〰〰⬰⸰\u206a†††Ⱐ†⸰㠱㠷㌵㌷〫㘮㘴㘱㐹樴⥝ਊ††', 'ਊ††敓\u2065汁潳\u200a†ⴠⴭⴭⴭਭ††捳灩\u2e79灳捥慩\u2e6c癩\u202c捳灩\u2e79灳捥慩\u2e6c癩\u0a65\u200a†丠瑯獥\u200a†ⴠⴭⴭ\u200a†圠\u2065獵\u2065桴\u2065污潧楲桴\u206d異汢獩敨\u2064祢䌠敬獮慨⁷ㅛ彝愠摮爠晥牥湥散\u2064祢\u200a†䄠牢浡睯瑩⁺湡\u2064瑓来湵嬠崲\u2c5f映牯眠楨档琠敨映湵瑣潩\u206e潤慭湩椠ੳ††慰瑲瑩潩敮\u2064湩潴琠敨琠潷椠瑮牥慶獬嬠ⰰ崸愠摮⠠ⰸ湩⥦\u202c湡\u2064桃扥獹敨\u0a76††潰祬潮業污攠灸湡楳湯\u2073牡\u2065浥汰祯摥椠\u206e慥档椠瑮牥慶\u2e6c删汥瑡癩\u2065牥潲\u2072湯\u200a†琠敨搠浯楡\u206eせ㌬崰甠楳杮䤠䕅⁅牡瑩浨瑥捩椠\u2073潤畣敭瑮摥嬠崳\u205f獡栠癡湩\u2067\u0a61††数歡漠\u2066⸵攸ㄭ‶楷桴愠\u206e浲\u2073景ㄠ㐮ⵥ㘱⠠\u206e‽〳〰⤰ਮ\u200a†删晥牥湥散ੳ††ⴭⴭⴭⴭⴭ\u200a†⸠\u202eㅛ⁝\u2e43圠\u202e汃湥桳睡\u202c䌢敨祢桳癥猠牥敩\u2073潦\u2072慭桴浥瑡捩污映湵瑣潩獮Ⱒ椠੮†††††⨠慎楴湯污倠票楳慣\u206c慌潢慲潴祲䴠瑡敨慭楴慣\u206c慔汢獥Ⱚ瘠汯\u202eⰵ䰠湯潤㩮\u200a†††††效\u2072慍敪瑳❹\u2073瑓瑡潩敮祲传晦捩ⱥㄠ㘹⸲\u200a†⸠\u202e㉛⁝\u2e4d䄠牢浡睯瑩⁺湡\u2064\u2e49䄠\u202e瑓来湵\u202c䠪湡扤潯\u206b景䴠瑡敨慭楴慣੬†††††䘠湵瑣潩獮Ⱚㄠ琰\u2068牰湩楴杮\u202c敎⁷潙歲›潄敶Ⱳㄠ㘹ⰴ瀠\u2e70㌠㤷ਮ†††††栠瑴㩰⼯睷\u2e77慭桴献畦挮⽡捾浢愯湡獤瀯条彥㜳⸹瑨੭††⸮嬠崳栠瑴㩰⼯潫敢敳牡档挮慰\u2e6e牯⽧瑨潤獣䴯瑡\u2d68敃桰獥䴯瑡⽨敃桰獥栮浴੬\u200a†䔠慸灭敬ੳ††ⴭⴭⴭⴭ\u200a†㸠㸾渠\u2e70ど嬨⸰⥝\u200a†愠牲祡ㄨ〮\u0a29††㸾‾灮椮⠰せⰮㄠ\u202e\u202b樲⥝\u200a†愠牲祡嬨ㄠ〮〰〰〰⬰⸰\u206a†††Ⱐ†⸰㠱㠷㌵㌷〫㘮㘴㘱㐹樴⥝ਊ††']

(yes that apparently all came from 'string', don't ask me how)

Upvotes: 0

Views: 4078

Answers (2)

Mark Tolonen
Mark Tolonen

Reputation: 177640

Additional example getting .raw and .value to work. Per documentation it works only for Array(ctypes.c_char,...):

from multiprocessing import Process
from multiprocessing.sharedctypes import Value, Array
import ctypes

def print_strings(s):
    """Print strings in the C array"""
    print(s.value)
    print(len(s))
    s[len(s)-1]=b'x'

if __name__ == '__main__':
    string_array = Array(ctypes.c_char, b'string')
    q = Process(target=print_strings, args=(string_array,))
    q.start()
    q.join()
    print(string_array.raw)

Output showing that shared buffer was modified:

b'string'
6
b'strinx'

Upvotes: 1

javidcf
javidcf

Reputation: 59701

The problem that you are having is mentioned in the documentation:

Note: Although it is possible to store a pointer in shared memory remember that this will refer to a location in the address space of a specific process. However, the pointer is quite likely to be invalid in the context of a second process and trying to dereference the pointer from the second process may cause a crash.

This means that storing pointers (like strings) is not going to work, because only the address will get to the child process, and that address will not be valid anymore there (hence the segmentation fault). Consider, for example, this alternative, where all the strings are concatenated into one array and another array with the lengths is passed too (you can tweak it to your convenience):

from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Value, Array
import ctypes

def print_strings(S, S_len):
    """Print strings in the C array"""
    received_strings = []
    start = 0
    for length in S_len:
        received_strings.append(S[start:start + length])
        start += length
    print("received strings:", received_strings)

if __name__ == '__main__':
    lock = Lock()
    my_strings = ['string1', 'str2']
    my_strings_len = [len(s) for s in my_strings]
    string_array = Array(ctypes.c_wchar, ''.join(my_strings))
    string_len_array = Array(ctypes.c_uint, my_strings_len)
    q = Process(target=print_strings, args=(string_array, string_len_array))
    q.start()
    q.join()

Output:

received strings: ['string1', 'str2']

About addresses in subprocess:

This is a bit off topic of the question, but it was to long to put into a comment. Honestly this starts to be out of my depth, take a look at eryksun's comments below for more informed insights, but here's my understanding anyway. On Unix(-like) a new process created through fork has the same memory and (virtual) addresses than the parent process, but if you then exec some program that's not the case anymore; I don't know if Python's multiprocessing runs an exec or not on Unix (note: see eryksun's comment for more on this and set_start_method), but in any case I wouldn't assume there is any guarantee that any address in the Python-managed memory pool should stay the same. On Windows, CreateProcess makes a new process from an executable that does not have in principle anything in common with the parent. I don't think even shared libraries used by multiple processes (.so/.dll) should be at the same address in either platform. I don't think sharing (virtual) addresses between processes even makes sense when using shared memory since, if I recall correctly (and I may not), shared memory blocks are mapped to arbitrary virtual addresses on each process. So my impression is that there is no good reason (or "good and obvious", at least) to share addresses with a subprocess (of course, pointer types in ctypes are still useful to talk to native libraries within the same process).

As I said, I'm not 100% confident in this, but I think the general idea goes like that.

Upvotes: 3

Related Questions