HighCommander4
HighCommander4

Reputation: 52799

Inconsistency between std::string and string literals

I have discovered a disturbing inconsistency between std::string and string literals in C++0x:

#include <iostream>
#include <string>

int main()
{
    int i = 0;
    for (auto e : "hello")
        ++i;
    std::cout << "Number of elements: " << i << '\n';

    i = 0;
    for (auto e : std::string("hello"))
        ++i;
    std::cout << "Number of elements: " << i << '\n';

    return 0;
}

The output is:

Number of elements: 6
Number of elements: 5

I understand the mechanics of why this is happening: the string literal is really an array of characters that includes the null character, and when the range-based for loop calls std::end() on the character array, it gets a pointer past the end of the array; since the null character is part of the array, it thus gets a pointer past the null character.

However, I think this is very undesirable: surely std::string and string literals should behave the same when it comes to properties as basic as their length?

Is there a way to resolve this inconsistency? For example, can std::begin() and std::end() be overloaded for character arrays so that the range they delimit does not include the terminating null character? If so, why was this not done?

EDIT: To justify my indignation a bit more to those who have said that I'm just suffering the consequences of using C-style strings which are a "legacy feature", consider code like the following:

template <typename Range>
void f(Range&& r)
{
    for (auto e : r)
    {
        ...
    }
}

Would you expect f("hello") and f(std::string("hello")) to do something different?

Upvotes: 38

Views: 4841

Answers (6)

Ise Wisteria
Ise Wisteria

Reputation: 11669

According to N3290 6.5.4, if the range is an array, boundary values are initialized automatically without begin/end function dispatch.
So, how about preparing some wrapper like the following?

struct literal_t {
    char const *b, *e;
    literal_t( char const* b, char const* e ) : b( b ), e( e ) {}
    char const* begin() const { return b; }
    char const* end  () const { return e; }
};

template< int N >
literal_t literal( char const (&a)[N] ) {
    return literal_t( a, a + N - 1 );
};

Then the following code will be valid:

for (auto e : literal("hello")) ...

If your compiler provides user-defined literal, it might help to abbreviate:

literal operator"" _l( char const* p, std::size_t l ) {
    return literal_t( p, p + l ); // l excludes '\0'
}

for (auto e : "hello"_l) ...

EDIT: The following will have smaller overhead (user-defined literal won't be available though).

template< size_t N >
char const (&literal( char const (&x)[ N ] ))[ N - 1 ] {
    return (char const(&)[ N - 1 ]) x;
}

for (auto e : literal("hello")) ...

Upvotes: 6

HighCommander4
HighCommander4

Reputation: 52799

The inconsistency can be resolved using another tool in C++0x's toolbox: user-defined literals. Using an appropriately-defined user-defined literal:

std::string operator""s(const char* p, size_t n)
{
    return string(p, n);
}

We'll be able to write:

int i = 0;     
for (auto e : "hello"s)         
    ++i;     
std::cout << "Number of elements: " << i << '\n';

Which now outputs the expected number:

Number of elements: 5

With these new std::string literals, there is arguably no more reason to use C-style string literals, ever.

Upvotes: 3

David Hammen
David Hammen

Reputation: 33126

However, I think this is very undesirable: surely std::string and string literals should behave the same when it comes to properties as basic as their length?

String literals by definition have a (hidden) null character at the end of the string. Std::strings do not. Because std::strings have a length, that null character is a bit superfluous. The standard section on the string library explicitly allows non-null terminated strings.

Edit
I don't think I've ever given a more controversial answer in the sense of a huge amount of upvotes and a huge amount of downvotes.

The auto iterator when applied to a C-style array iterates over each element of the array. The determination of the range is made at compile-time, not run time. This is ill-formed, for instance:

char * str;
for (auto c : str) {
   do_something_with (c);
}

Some people use arrays of type char to hold arbitrary data. Yes, it is an old-style C way of thinking, and perhaps they should have used a C++-style std::array, but the construct is quite valid and quite useful. Those people would be rather upset if their auto iterator over a char buffer[1024]; stopped at element 15 just because that element happens to have the same value as the null character. An auto iterator over a Type buffer[1024]; will run all the way to the end. What makes a char array so worthy of a completely different implementation?

Note that if you want the auto iterator over a character array to stop early there is an easy mechanism to do that: Add a if (c == '0') break; statement to the body of your loop.

Bottom line: There is no inconsistency here. The auto iterator over a char[] array is consistent with how auto iterator work any other C-style array.

Upvotes: 22

Lightness Races in Orbit
Lightness Races in Orbit

Reputation: 385274

That you get 6 in the first case is an abstraction leak that couldn't be avoided in C. std::string "fixes" that. For compatibility, the behaviour of C-style string literals does not change in C++.

For example, can std::begin() and std::end() be overloaded for character arrays so that the range they delimit does not include the terminating null character? If so, why was this not done?

Assuming access through a pointer (as opposed to char[N]), only by embedding a variable inside the string containing the number of characters, so that seeking for NULL isn't required any more. Oops! That's std::string.

The way to "resolve the inconsistency" is not to use legacy features at all.

Upvotes: 19

Howard Hinnant
Howard Hinnant

Reputation: 219205

If we overloaded std::begin() and std::end() for const char arrays to return one less than the size of the array, then the following code would output 4 instead of the expected 5:

#include <iostream>

int main()
{
    const char s[5] = {'h', 'e', 'l', 'l', 'o'};
    int i = 0;
    for (auto e : s)
        ++i;
    std::cout << "Number of elements: " << i << '\n';
}

Upvotes: 29

robert
robert

Reputation: 34418

If you wanted the length, you should use strlen() for the C string and .length() for the C++ string. You can't treat C strings and C++ strings identically--they have different behavior.

Upvotes: 4

Related Questions