6502
6502

Reputation: 114461

Why not allowing std::string initialization from array of chars?

In C++ you can initialize an std::string object from a char * and a const char * and this implicitly assumes that the string will end at first NUL character found after the pointer.

In C++ string literals are however arrays and a template constructor could be used to get the correct size even if the string literal contains embedded NULs. See for example the following toy implementation:

#include <stdio.h>
#include <string.h>
#include <vector>
#include <string>

struct String {
    std::vector<char> data;
    int size() const { return data.size(); }

    template<typename T> String(const T s);

    // Hack: the array will also possibly contain an ending NUL
    // we don't want...
    template<int N> String(const char (&s)[N])
        : data(s, s+N-(N>0 && s[N-1]=='\0')) {}

    // The non-const array removed as probably a lot of code
    // builds strings into char arrays and the convert them
    // implicitly to string objects.
    //template<int N> String(char (&s)[N]) : data(s, s+N) {}
};

// (one tricky part is that you cannot just declare a constructor
// accepting a `const char *` because that would win over the template
// constructor... here I made that constructor a template too but I'm
// no template programming guru and may be there are better ways).
template<> String::String(const char *s) : data(s, s+strlen(s)) {}

int main(int argc, const char *argv[]) {
    String s1 = "Hello\0world\n";
    printf("Length s1 -> %i\n", s1.size());
    const char *s2 = "Hello\0world\n";
    printf("Length s2 -> %i\n", String(s2).size());
    std::string s3 = "Hello\0world\n";
    printf("std::string size = %i\n", int(s3.size()));
    return 0;
}

Is there any specific technical reason for which this approach wasn't considered for the standard and instead a string literal with embedded NULs ends up being truncated when used to initialize an std::string object?

Upvotes: 2

Views: 2636

Answers (2)

Cheers and hth. - Alf
Cheers and hth. - Alf

Reputation: 145204

Initializing a std::string with a literal that contains embedded nullbytes requires passing both the starting pointer and the length to a constructor.

That's easiest if there is a dedicated takes-array-reference constructor template, but as you note

  • such a template, with only the array argument, would be considered a worse match than the constructor taking simply char const*, and

  • it would be unclear whether a final terminating nullvalue should be included or not.

The first point means that the physical code interface would be a single templated constructor, where only the documentation (and not your editor's tooltip for example) would tell the full story about what it acccepted or not. One fix is to introduce an additional dummy resolver argument. That reduces convenience.

The second point is an opportunity for introducing bugs. The most common use of the constructor would no doubt be ordinary string literals. Then, now and then, it would be used for literals and/or arrays with embedded nullbytes, but curiously with the last character choppped off.

Instead one can simply first name the value,

char const data[] = "*.com\0*.exe\0*.bat\0*.cmd\0";
string s( data, data + sizeof( data ) );    // Including 2 nulls at end.

All that said, when I've defined my own string classes I've included the takes-array-argument constructor, but for a very different reason than convenience. Namely, that in the case of a literal the string object can simply hold on to that pointer, with no copying, which provides not only efficiency but also safety (correctness) for e.g. exceptions. And an array of const char is the most clear indication of literal that we have in C++11 and later.

However, a std::string can't do this: it's not designed for it.


If this is often done then one might define a function like this:

using Size = ptrdiff_t;

template< Size n >
auto string_from_data( char const (&data)[n] )
    -> std::string
{ return std::string( data, data + n ); }

Then one can write just

string const s = string_from_data( "*.com\0*.exe\0*.bat\0*.cmd\0" );

Disclaimer: none of the code touched or seen by a compiler.


[I missed this on a first writing, but was reminded by Hurkyl's answer. Now heading for coffee!]

A C++14 string type literal chops off the final \0, so with such literal the above would have to include that terminating nullvalue explicitly:

string const s = "*.com\0*.exe\0*.bat\0*.cmd\0\0"s;

Apart from that, C++14 string type literals appear to provide the sought for convenience.

Upvotes: 3

user1084944
user1084944

Reputation:

C++14 introduces a suffix for string literals to make them into std::string objects, so the main use case is no longer relevant.

#include <iostream>
#include <string>
using namespace std;
using namespace std::literals;

int main() {
    string foo = "Hello\0world\n";
    string bar = "Hello\0world\n"s;
    cout << foo.size() << " " << bar.size() << endl; // 5 12
    cout << foo << endl; // Hello
    cout << bar << endl; // Helloworld
    return 0;
}

Upvotes: 8

Related Questions