ethanmorton
ethanmorton

Reputation: 86

How do in-memory databases (such as redis) communicate with other applications?

I'm in the process of implementing a local computer database that allows me to transfer information and data between C and fortran programs (C for control flow, fortran for matrix computation).

I understand the idea of an in memory database, but how do programs get data from it? Do I need to open local ports and just have a regular connection to it? Is there another, lower level system call or something that allows me to directly communicate with programs?

In my head I am going back and forth between making a big C program that runs the database inside of it as well as the fortran matrix computation (not directly on the database), and just storing it to a binary file and reopening it between programs.

I also understand that using someone else's software would be easier and faster, but I want to do it myself to increase my understanding and programming chops.

Upvotes: 1

Views: 436

Answers (1)

Steven Graves
Steven Graves

Reputation: 911

I can't speak to redis, but I can tell you about my company's implementation: eXtremeDB.

eXtremeDB is written mostly in C (C++ for the SQL, some assembly e.g. for spinlocks). We offer native and SQL APIs for many languages that can be used interchangeably.

For a mixed language scenario like you describe, the database would be created in shared (named) memory that is mapped to each process' address space. As such, it is an 'embedded' database. The database runtime itself is shared libraries that get linked with the application. So, this fits your description "a big C program that runs the database inside". You can substitute 'fortran' for 'C' as appropriate.

Accordingly, the processes have direct, very fast, access to the stored data through the published interfaces (i.e. no inter-process communication overhead compared to a client/server architecture) . The database runtime controls concurrent access. "access" can be through SQL (SELECT * FROM table WHERE...) or through a native API. The native API is faster, of course. And for a roll-your-own approach, much more tractable (implementing an SQL engine is kind of a big deal).

You'd probably want to implement a 'load' and 'store' interface to save and reload an in-memory database between runs. This is pretty simple; The in-memory database will exist in a contiguous piece of memory (e.g. use the shared memory ops of your OS to allocate 5MB of shared memory and map it to the local address space), which can just be streamed out to persistent media. This implies that you'll create a sub-allocator in your database run-time to dole out smaller chunks for the storage of objects. If there are relationships between objects in the shared memory database, make sure they are stored as offsets versus direct pointer references, because the there's no guarantee that the database will be mapped to the same starting memory address on a subsequent run.

Upvotes: 1

Related Questions