Philip P.
Philip P.

Reputation: 2394

What is current best practice around use of strings in cross-platform C and C++ APIs?

I looks like I may need to embark on some cross-platform project and part of it will have to be done in C or C++ (not decided yet hence the question is about them both). I will be dealing mostly with the text-based stuff and strings in general.

That C/C++ will have an API callable from the higher-level platform-dependent code.

My question is: what type(s) is it advisable to use to work with strings, in particular when declaring public interfaces? Are there any recommended standard techniques? Are there things to avoid?

I have little experience of writing C or C++ code, and even that was on Windows, so nothing like cross-platform here at all. So what I'm really looking for is for something to get me on the right way and avoid doing stupid things which are bound to cause a lot of pain.


Edit 1: To give a bit more context about the intended use. The API will be consumed by:

Upvotes: 10

Views: 907

Answers (4)

user541686
user541686

Reputation: 210705

Rules

  • Use UTF formats to store strings, not "code pages" or whatnot (UTF-16 is probably easier edit: I totally forgot about byte order issues; UTF-8 is probably the way to go).

  • Use null-terminated strings instead of counted strings, as these are the easiest to access from most languages. But be careful about buffer overflows.
    Update 6 years later: I recommended this API for interoperability reasons (since so many already use null-termination, and there are multiple ways to represent counted strings), not the best one from a best-design standpoint. Today I would probably say the former is less important and recommend using counted strings rather than null-terminated strings if you can do it.

  • Do not even try to use classes like std::string to pass around strings to/from the user. Even your own program can break after upgrading your compiler/libraries (since their implementation detail is just that: an implementation detail), let alone the fact that non-C++ programs will have trouble with it.
    Update 6 years later: This is strictly for language and ABI compatibility reasons with other languages, not general advice for C++ program development. If you're doing C++ development, cross-platform or otherwise, use the STL! i.e. only follow this advice if you need to call your code from other languages.

  • Avoid allocating strings for the user unless it's truly painful for the user otherwise. Instead, take in a buffer and fill it up with data. That way you don't have to force the user to use a particular function to free the data. (This is also often a performance advantage as well, since it lets the user allocate small buffers on the stack. But if you do do that, provide your own function to free the data. You can't assume that your malloc or new can be freed with their free or delete -- they often can't be.)

Note:

Just to clarify, "let the user allocate the buffer" and "use NULL-terminated strings" do not run against each other. You still need to get the buffer length from the user, but you include the NULL when you terminate the string. My point was not that you should make a function similar to scanf("%s"), which is obviously unusably dangerous -- you still need the buffer length from the user. i.e. Do pretty much what Windows does in this regard.

Upvotes: 15

Mark Ransom
Mark Ransom

Reputation: 308432

A very common way to return a string to a caller is to accept a string buffer pointer and a character count of the buffer size. A useful convention is to return the number of characters copied into the buffer as the return value; this is especially valuable if you treat a buffer size of 0 as a special case and return the number of characters that are required (including the null terminator).

int GetString(char * buffer, int buffersize);

In C++ it is convenient to work with std::string instead, but this presents a problem: you can't rely on the implementation of std::string to be compatible between differently compiled parts of the program, i.e. between your main program and the library. By providing an inline function in a header file, you can ensure that the std::string is created in the same context as the caller and bypass this problem.

inline std::string GetString()
{
    std::string result(GetString(NULL, 0), 0);
    GetString(&result[0], result.size());
    result.erase(result.size() - 1);
    return result;
}

Upvotes: 1

DWoldrich
DWoldrich

Reputation: 4017

If you want a ten ton hammer to deal with strings in C/C++, then IBM's ICU project is for you. http://site.icu-project.org/

ICU has all the tools for working with strings with really good unicode support. It is an impressive and well-maintained open source product with a favorable license for commercial projects.

If you want to release your code as a .dll/.so for others to call, then you probably want to minimize your external dependencies. You may want to stick to standard libraries or a more lightweight project in that case.

Upvotes: 4

Armen Tsirunyan
Armen Tsirunyan

Reputation: 133072

That C/C++ will have an API callable from the higher-level platform-dependent code.

If by this you mean that you intend this library to be a DLL which may be called from other languages, for example, .NET languages, then I strongly recommend having all public API as extern "C" functions that have only POD types as parameters and return values. That is, prefer /*const*/ char* over std::string. Remember, C++, unlike plain C, has no standard ABI.

Upvotes: 4

Related Questions