astifter
astifter

Reputation: 85

How to design a versioned C API

I would like to design a C API that provides several versions of itself in a single library.

I came across the description for the Foundation DB C API but since the FoundationDB source code is no longer available I can't figure out how they did it. All other libraries I know provide one API in a given version of the library and one has to link to a specific version to get the desired API.

I'm fully aware that supporting old API versions is a major hassle and I will try to get the API right the first time but since this API gets distributed on geographically distributed systems with little maintenance possibilities I still would like to be able to update only my library without breaking other software.

With object-oriented languages the task is easier/trivial (depending on the language) but for C?

Upvotes: 7

Views: 5011

Answers (4)

user4842163
user4842163

Reputation:

One of the things I'd strongly recommend is to use a single version number for your entire library. I say this after seeing so much suffering in a previous codebase which didn't do this.

In the previous codebase, it was a plugin architecture. Plugins would be passed a function pointer like this:

EXPORT void my_plugin_func(LookupFunction* lookup)
{
     // Retrieve version 3 of the drawing interface.
     struct DrawingInterface3* drawer = lookup(DRAWING_INTERFACE, 3);

     // Retrieve version 1 of the widget interface.
     struct WidgetInterface7* widget = lookup(WIDGET_INTERFACE, 1);

     // Retrieve the latest version of the brush interface.
     struct BrushInterface* brush = lookup(BRUSH_INTERFACE, BRUSH_INTERFACE_VER);
     ...
}

While that might seem cool given how you can mix and match and use whatever available version of an interface you want on a per-interface level, you can probably begin to imagine what a maintenance nightmare this is.

First because there are so many version numbers to deal with (one for every single interface available), there are so many ways for developers to go wrong. And given so many ways to go wrong in a large team, things will occasionally go wrong. It wasn't too uncommon that we'd have developers bump a version number two or more times in the same cycle or, far worse, forget to increment a version number for an interface they modified outright.

This latter mistake is disastrous if the SDK ships with this mistake, because then it makes all the plugins ever written for, say, DrawingInterface3 broken because they're no longer binary compatible. Meanwhile all the people who are now writing plugins against DrawingInterface3 which is actually supposed to be DrawingInterface4 (but the person who updated the interface forget to version it off) will now need all their plugins recompiled against the fixed SDK, and not all of them will do that (some will release a plugin and stop maintaining it). So even after we fix the problem for plugins that were actually written against DrawingInterface3, it will permanently make all plugins built against what should have been DrawingInterface4 foobar.

But this isn't even the worst problem. The worst is that now the code behind the hood has to handle an explosive combination of interfaces. Someone might want to draw to Widget version 7 using Drawing interface version 2, and that might require a completely separate branch of code from those drawing to Widget version 3 using Drawing interface 5. This, in spite of a clever way of devising polymorphic solutions under the hood, lead to the most insane amount of explosive code required to bump any interface version with barely any practical benefits to doing so.

So I recommend, above all else, to keep it to a single version number for the entire library/SDK. You can just do like this:

// Tell the system what SDK version we're using.
EXPORT int32_t my_plugin_version(void)
{
     return SDK_VERSION_NUMBER;
}

// The system will now know what interfaces to provide
// to this plugin after calling the above function.
EXPORT void my_plugin_func(LookupFunction* lookup)
{
     // Retrieve latest version of the drawing interface.
     struct DrawingInterface* drawer = lookup(DRAWING_INTERFACE);

     // Retrieve latest version of the widget interface.
     struct WidgetInterface* widget = lookup(WIDGET_INTERFACE);

     // Retrieve latest version of the brush interface.
     struct BrushInterface* brush = lookup(BRUSH_INTERFACE);
     ...
}

With object-oriented languages the task is easier/trivial (depending on the language) but for C?

For me it's not really OOP that makes things trivial. Actually OOP using native code can make things so much harder. For example, versioning a dylib and even ensuring wide compatibility using C++ can be a nightmare that requires foregoing many of the features of C++ (exception handling, virtual functions, standard library, objects in general if you want to target not only other C++ compilers but also FFIs in other languages, etc). The thing that tends to make it easy or not has to do with how the code is linked dynamically. With languages using JIT or interpreters, that's much simpler.

With native code, that tends to be a lot more complicated since it introduces all kinds of ABI issues down to things like calling conventions since your library has to be used directly in its binary form instead of being compiled and linked on the fly for the user's particular machine, standard library, etc. With non-native code, it's kind of like you're open-sourcing your library (without actually doing it and instead shipping some intermediary code that's not exactly native, like IR byte code which is compiled on the fly). And naturally "open-sourcing" makes things a lot simpler when you aren't actually shipping native binaries to the user.

My primary reason for using a C API over C++ is particularly because of how much simpler and more universally-compatible C APIs are than C++. I use C++ to implement all the C APIs, but my life has become a lot simpler after using C strictly for the API interfaces themselves (when combined with a single version number for the entire SDK).

Convenience and Safety

I picked this up in the comments but I have some advice: don't bother with trying to make your APIs (what is actually dynamically linked) very convenient and safe to use. Otherwise you can multiply and multiply your versioning maintenance efforts unless your library is fairly trivial (and fairly trivial ones typically wouldn't be concerned so much with versioning-related architectural concerns).

Instead if you want people using your library to have very nice and convenient interfaces that are safe to use, give them a static library with wrappers on top of your exported API. This statically linked library is not something you have to maintain in terms of backwards binary compatibility because they actually get built by the users of your library This static "convenience/helper" library can be as convenient as you want.

The reason I suggest this is that if you try too hard to make your raw exported API really convenient to use, you can get into a case where you are now maintaining 10 times the legacy code. It's like refactoring minus all the benefits because now you have to maintain the old versions of the "not-so-convenient" function implementations, the newer versions of the moderately "convenient/safe" functions, the newest versions, etc. You have to maintain the entire legacy of code of which you can reach an explosive amount because of all the new functions and changes you keep introducing while deprecating stuff just to try to make your exported API more and more convenient and safe to use.

So I recommend against that and instead suggest to focus on that sort of stuff in the static library. Don't export more functions for convenience/safety. Export new functions only because they provide required functionality not there before. Focus on the helper/convenience stuff elsewhere. Of course you might still have to maintain "source compatibility" for the convenience library to some degree, but probably people writing code against your library will forgive you if they have to change some code every few years to build stuff against the newest version of your library. Binary compatibility is different because you might find, 10 years from now, that you still can't remove that 10-year old legacy code because users still find old binaries built using the old version still useful. So it really helps to not have too much legacy code, and you'll tend to have less if you aren't trying to make your exported functions (the "raw ones", unwrapped) as convenient as possible. With source compatibility with the wrappers, they cannot possibly break either no matter what changes internally as long as you maintain binary compatibility with the older versions of the exported APIs, so again it helps to have smallest target to maintain for binary compatibility.

Besides that, trying to make your C APIs convenient/safe is often going to be in vain because, for example, no amount of safety you impose on top of a C API will make C++ developers who practically require RAII-conformance given exception-handling to ever be happy without writing their own wrappers on top of your library. C# developers would never want to use the library in raw form -- they'd be even more extreme than the C++ developers.

So often the people are going to be writing safe wrappers on top of your library anyway. The most productive route to me if you want a safe and nice library to use, if it's of a non-trivial scale (something that spans, says, hundreds of headers), is to just focus on exporting required functionality, no convenience/helper stuff, and build the convenience/helper stuff in a separate statically linked library whose source code you hand directly to the users to build.

printf

I like the printf example of this because if you look at printf, it is a variadic function and those are very unsafe to use in C and often a tripping point for developers. But on the flip side, it's an ancient function that has been around for so many decades and has remained relevant today without requiring a printf_ver2, printf_ver3, and so on, and that's because the variadic nature of the function allows it to be extended without introducing new functions.

So I often see the sweet spot there as having something like printf which is going to allow you to extend it in future versions without introducing a boatload of functions and legacy code to maintain, but simultaneously provide wrappers on top which are safe to use (therefore the analogical printf can just be used in one place implementing such wrappers, and not used directly by the users). That combo should then give you a small target to maintain for backwards compatibility, while simultaneously providing something a lot more safe and convenient to use built on top. To me it helps a lot with maintaining long-lived libraries with versioning efforts to prioritize maintainability and extensibility first, and then tackle convenience and safety and so on for the users separately, because the maintenance efforts of long-lived libraries can become astronomical in costs over the long run if you aren't careful to keep the binary, exported target as small and as minimalist as possible. And it's definitely not fun to be maintaining a boatload of 20-year old code that only has to be there because some people are still using stuff written against it.

Easy Extensions

One last thing of note is that with C, there are things you can add without affecting binary compatibility and versioning off interfaces. For example, you can add fields to the bottom of a struct without affecting binary compatibility provided the users of the struct do not need to know its size (ex: they aren't instantiating it themselves). In that case, those using an older version of your library can be handed a pointer to the latest struct instance but they simply won't see the new fields you added since they aren't seeing the latest definition of the struct (don't have the latest headers, i.e.), but everything they see will still work just fine and exactly the way it did before (provided you didn't change the implementation for the existing functions when you added the new fields). So there's a lot of room for additions without the pain of maintaining multiple versions of things provided they are done properly in ways that preserve ABI without requiring you to bump your library version and having to implement whole new interfaces.

I recommend exploiting that as much as possible since that's another thing I saw in a former codebase. Some developers bumped SDK versions for changes that didn't affect ABI at all and didn't affect the functionality for users of older versions of the library, and that needlessly created whole new code branches to maintain. Again the maintenance efforts required multiply the more versions you add to maintain, so it helps to exploit and find as many ways as possible to avoid versioning things off and to keep the amount of code you have to maintain for each version as minimalist as possible.

ABI is also a bit tricky so it's really helpful to have unit tests for each of the older versions of an interface to make sure they still work as they should with the introduction of newer versions. You don't even have to build the unit tests for older versions over and over since the point of those is to ensure binary compatibility. So you could just archive their executables and run them in CI, e.g., without having to build the source code over and over (there are actually arguments against building them over and over since the point is to ensure that old binaries built against older versions still work against the newest binaries of your library). Those unit tests will also clear up any doubts as you navigate through the landmines of ABI and backwards compatibility about whether a change you make will or will not affect previous binaries and whether or not you need to version off a brand new interface and implementation or can just modify the existing one(s).

Upvotes: 11

Some programmer dude
Some programmer dude

Reputation: 409196

I don't know how the Foundation DB C API works or is designed, but one way is to emulate inheritance in C, using structures and function pointers.

You start with a base structure, something like

struct base_api
{
    int version;
};

Then you "inherit" (or extend) this base structure:

struct version_1_api
{
    struct base_api base;
    // Function pointers for version 1 of the API
};

struct version_2_api
{
    struct base_api base;
    // Function pointers for version 1 of the API
    // Function pointers for version 2 of the API
};

Then have one exported function which takes a version number, and returns a pointer to struct base_api which the application can then cast to a pointer to the appropriate structure:

struct base_api *api = library_get_api();
if (api->version >= 2)
{
    // We have at least version 2 of the API available
    struct version_2_api *api2 = (struct version_2_api *) api;
    // Use version 2 of the API
}
else if (api->version >= 1)
{
    // We have version 1 of the API available
    struct version_1_api *api1 = (struct version_1_api *) api;
    // Use version 1 of the API
}
else
{
    // Unsupported version
}

The library_get_api function in the above example simply returns a pointer to a static structure. Something like e.g.

struct base_api *library_get_api()
{
    static version_2_api api = {
        { 2 } // Version
        // Function pointers for version 1
        // Function pointers for version 2
    };

    return (struct base_api *) &api;
}

Upvotes: 7

Anonymous Coward
Anonymous Coward

Reputation: 3200

Function pointers.

For every function in your library you declare a function pointer variable:

return_type ( function_name_impl* )(parameters);

You implement that function several times, as many as versions you need. So you have function_name_VERSION_1, function_name_VERSION_2, etc.

A version chosing function assings the proper pointer to each function pointer variable.

Finally, a macro function_name is used so that your code can just call the needed function without needing to bother with selecting the API version each time and without having to use the sintax for function pointers.

This strategy has an important advantage. If you have already implemented your 1st version of the API and is already under use in source form you can transform it to a multiversion API using this strategy and you will need no change at all in the sources using your API except for a call to setVersion;

library.h :

#ifndef LIBRARY_H
#define LIBRARY_H

#include <errno.h>

#define VERSION_1 1
#define VERSION_2 2

/**
 * Example of library function
 */
#define compute(a,b) ((*compute_impl)((a),(b)))

extern int (*compute_impl)( int a, int b);


/**
 * Set version of library to be used.
 * Sets errno to 0 on success. To non-zero if requested version
 *is not available
 */

extern void setVersion( int version );

#endif // LIBRARY_H

library.c :

#include "library.h"

int (*compute_impl)( int a, int b);

int compute_VERSION_1( int a, int b)
{
  return a+b;
}

int compute_VERSION_2( int a, int b)
{
  return a+b+1;
}

/**
 * Set version of library to be used.
 * Sets errno to 0 on success. To non-zero if requested version
 *is not available
 */
void setVersion( int version )
{
  switch( version )
  {
    case VERSION_1 :
      compute_impl = &compute_VERSION_1;
      break;
    case VERSION_2 :
      compute_impl = &compute_VERSION_2;
      break;
    default :
      errno = 1;
      return;
   }
   errno = 0;
   return;
}

main.c :

#include <stdio.h>
#include "library.h"

int main(void)
{
  int j;

  setVersion( VERSION_2 );
  if ( errno )  {
    printf("API version requested not available\n");
    return 1;
  }
  j = compute( 3, 7 );
  printf("%d\n", j );
  return 0;
}

Upvotes: 2

user3386109
user3386109

Reputation: 34829

In C, you can make all of the functions variadic, with the first parameter indicating the version number, e.g.

int foo( int version, char *buffer, int length, ... )
{
}

That allows you to add more parameters if necessary, but doesn't allow you to change the types of buffer or length. You could of course, do this

int foo( int version, ... )

but then even the first version of the function is not self-documenting.


The other option is to pass a pointer to a structure, e.g.

struct FooParams
{
    int version;
    char *buffer;
    int length;
};

int foo( struct FooParams *params )
{
}

The structure definition should include a size and/or a version, so that you know which structure the caller is using.

Upvotes: 2

Related Questions