idanshmu
idanshmu

Reputation: 5251

What is the recommended way for DLLs to share data?

Flow

I have a reader DLL written in C++.
I also have writer DLL written in some language (not in C++).
DLLs run in the same process synchronously.

  1. Reader DLL calls writer's DLL API, GetData
  2. Writer DLL prepares data, either by downloading it, extracting it, etc.
  3. Reader DLL reads and use the data

Question

What is the recommended way for DLLs to share data?


Approach 1

Reader DLL pass file path argument to Writer DLL and reads the data from file.

Cons

I'd like to avoid writing data to disk. Even if it is the most robust solution I'd like to explore different options since it doesn't seem very elegant to me to write data to disk when you don't need it on disk.


Approach 2

Writer DLL will allocate buffer on the heap and return an address and size to the reader DLL.

Cons

Reader DLL must free the memory. Is it feasible? delete memory by address and size?

Also, it is probably a big NO-NO allocating and freeing buffer across modules/languages


Approach 3

Separate the GetData() to two calls.

  1. Reader DLL Calls GetDataSize()
  2. Reader DLL allocate buffer and pass the address to Writer DLL
  3. Writer DLL fills buffer
  4. Reader DLL use buffer
  5. Reader DLL frees buffer

This is the acceptable WINAPI approach.

Cons

I assume that Writer DLL is capable of knowing the size of the data prior to writing but that is not always the case.


Approach 4

Use windows file mapping

Cons

Similar cons to Approach 2 & 3.

Upvotes: 0

Views: 1502

Answers (2)

Matteo Italia
Matteo Italia

Reputation: 126777

Note: as we are talking about passing data between two different languages, I'm going to assume we are talking about "raw" data (primitive types, PODs & co.) that don't need any special treatment on destruction; if this is not the case, please tell me in the comments.

  1. Obviously feasible, but I wouldn't consider it unless desperate. The two dlls live in the same virtual address space, so they can share data straight in memory, without need to go through disk.

  2. Feasible and routinely done; the problem you must generally work around is that often the "default" heap of a given module is private1 so allocating from one and freeing from the other is a big no-no. There are two typical ways to implement this:

    • go through a heap that is surely available to both modules; in Win32, you'll often find LocalAlloc/LocalFree (or other Win32 API-provided heap primitives) used for this, as they are logically "below" all user-mode code, and provide a shared heap available to all the modules in the current process; so, one side knows that it must allocate using LocalAlloc, the other side knows that this data must be deallocated using LocalFree; everything works fine;
    • the allocating module provides also a deallocation function for the memory it allocates; the client code knows that whatever it received allocated by module A, must be freed using the A_free() function. This in turn will probably just wrap your language deallocation function, to be used as a counterpart to the allocations you do in the "business logic" exported functions. By the way, it may be useful to have an A_malloc() as well to mark the allocations that are expected to be freed by A_free() - even though they may be plain malloc/free today, you may be interested in changing this later.
  3. Routinely done as well; often in Win32 APIs there's some special invocation form that allows to retrieve the needed size to allocate; may be cumbersome to use or implement if such size cannot be computed easily without actually trying to do whatever the function has to do, or if such size fluctuates (the Win32 APIs to retrieve processes data come to mind, where you may have to loop keeping increasing allocations in case the data to retrieve is actually increasing between one call and the other).

  4. Can be done, although I've never seen it done for in-process data; the overhead on allocation is going to be bigger than any "regular" heap function, but nothing like writing to file; in general it's more cumbersome than the LocalAlloc/LocalFree solution for no particular gain, so I wouldn't bother.

Personally, I'd go with the option 2 - it's trivial to implement and doesn't require big changes to how you usually would write this stuff in C - the only difference is that you must use a particular pair of allocation/deallocation functions when working with this data.

An extra possibility that comes to mind is to have your function take the allocation function as a callback parameter (and possibly a dellocation function parameter as well, if it's needed for your algorithm - dynamically growing arrays come to mind); it'll be the caller to supply it, so the called DLL will allocate with whatever heap the caller likes most.


Notes

  1. although it can be shared, e.g. if the two modules link dynamically against the same C runtime it probably is; OTOH, if the two modules are written in different languages this is highly unlikely.

Upvotes: 2

Christophe
Christophe

Reputation: 73366

The DLLs all run in the same process and address space. So they can share any data directly in memory. The challenge is only how to give access to the data, especially if you use different languages.

  • Option 1 is easy because you just need to pass the common file name to the reader. But why this overhead ? There's a lighter string variant: If you manage to pass a filename as a string, you could as well let the writer serialise the data in a string and pass it to the reader

  • Option 2 is more delicate. It's ok if memory is allocated/deallocated on the same side of the DLL or if you allocate your buffer using windows API. Otherwise it can be tricky because memory allocation passes the DLL barriers with difficulty (not because of the address space, but because of the risk of using different heaps and different allocation/release routines). Furthermore, you can't be sure that the calling programme manages the C++ object lifecycle properly (if you use some RAII design on C++ side). So, this is an option only if the reader manages the buffer lifecycle:

    • caller asks reader to allocate, then caller provides the writer the address of the buffer, then caller calls reader again to process the data and to release the buffer.
    • fixed size of buffer is acceptable, i.e. size of the daa is known.
  • Option 3 is option 2 done well

  • Option 4 has still the disk overhead if you use mapped file I/O, with an additional question: can you map two times the same file in the same process ? If you'd be tempted by this option, have a second look a the string based variant that I proposed for option 1 above: the shared string plays the role of the memory mapping without the inconvenience of the file.

The string variant seems an easy alternative for jumping over language barriers with complex data structures. Almost every language has strings. The producer can build its string without having to know the size of the data in advance. Finally, even if strings are managed differently across language, there's always a way to access them in read-only.

But my preferred variant would be to organize the whole thing in way that the writer (or the main programme acting as mediator) calls the processing functions of the reader as needed (when parts of the data are available), providing data as arguments of a well defined types to function calls.

Upvotes: 1

Related Questions