Reputation: 3170
The raw string literals in C++11 are very nice, except that the obvious way to format them leads to a redundant newline \n
as the first character.
Consider this example:
some_code();
std::string text = R"(
This is the first line.
This is the second line.
This is the third line.
)";
more_code();
The obvious workaround seems so ugly:
some_code();
std::string text = R"(This is the first line.
This is the second line.
This is the third line.
)";
more_code();
Has anyone found an elegant solution to this?
Upvotes: 36
Views: 13640
Reputation: 10710
With C++20 this can now be implemented fully at compile-time by using a string literal operator template.
That has a few key benefits:
const char (&)[N]
) - like normal character literals in C++; so no std::array
shenanigans and lifetime issues.Usage Example: godbolt
std::cout << R"(
a
b
c
d
)"_M << std::endl;
/* Will print the following:
a
b
c
d
*/
// The type of R"(...)"_M is const char (&)[N],
// so it can be used like a normal string literal:
std::cout << std::size(R"(asdf)"_M) << std::endl;
// (will print 5)
constexpr std::string_view str = R"(
foo
bar
)"_M;
// str == "foo\nbar"
// also works with wchar_t, char8_t, char16_t and char32_t literals:
std::wcout << LR"(
foo
bar
)"_M;
std::wcout << std::endl;
Normally it's not possible to pass string literals as template arguments, e.g.:
template<const char* str>
void foo();
// ill-formed
foo<"bar">();
But with C++20 we can now have class-type template arguments, and those could be constant-initialized from a string literal.
That in combination with the new string literal operator templates makes it possible to get the entire string literal as a template parameter:
template<class _char_type, std::size_t size>
struct string_wrapper {
using char_type = _char_type;
consteval string_wrapper(const char_type (&arr)[size]) {
std::ranges::copy(arr, str);
}
char_type str[size];
};
template<string_wrapper str>
consteval decltype(auto) operator"" _M() {
/*...*/
}
// R"(foobar)"_M
// would now result in the following code:
// operator"" _M<string_wrapper<char, 7>{"foobar"}>()
Having both the length and the individual characters as constant expressions enables us now to compute the required size for the unindented string fully at compile-time and storing the resulting string in another template parameter (so that we merely need to return a reference to the final string value):
// unindents the individual lines of a raw string literal
// e.g. unindent_string(" \n a\n b\n c\n") -> "a\nb\nc"
template<class char_type>
consteval std::vector<char_type> unindent_string(string_view<char_type> str) {
/* ... */
}
// returns the size required for the unindented string
template<class char_type>
consteval std::size_t unindent_string_size(string_view<char_type> str) {
/* ... */
}
// used for sneakily creating and storing
// the unindented string in a template parameter.
template<string_wrapper sw>
struct unindented_string_wrapper {
using char_type = typename decltype(sw)::char_type;
static constexpr std::size_t buffer_size = unindent_string_size<char_type>(sw.str);
using array_ref = const char_type (&)[buffer_size];
consteval unindented_string_wrapper(int) {
auto newstr = unindent_string<char_type>(sw.str);
std::ranges::copy(newstr, buffer);
}
consteval array_ref get() const {
return buffer;
}
char_type buffer[buffer_size];
};
// uses a defaulted template argument that depends on the str
// to initialize the unindented string within a template parameter.
// this enables us to return a reference to the unindented string.
template<string_wrapper str, unindented_string_wrapper<str> unindented = 0>
consteval decltype(auto) do_unindent() {
return unindented.get();
}
// the actual user-defined string literal operator
template<string_wrapper str>
consteval decltype(auto) operator"" _M() {
return do_unindent<str>();
}
Full Code: godbolt
#include <algorithm>
#include <string_view>
#include <vector>
#include <ranges>
namespace multiline_raw_string {
template<class char_type>
using string_view = std::basic_string_view<char_type>;
// characters that are considered space
// we need this because std::isspace is not constexpr
template<class char_type>
constexpr string_view<char_type> space_chars = std::declval<string_view<char_type>>();
template<>
constexpr string_view<char> space_chars<char> = " \f\n\r\t\v";
template<>
constexpr string_view<wchar_t> space_chars<wchar_t> = L" \f\n\r\t\v";
template<>
constexpr string_view<char8_t> space_chars<char8_t> = u8" \f\n\r\t\v";
template<>
constexpr string_view<char16_t> space_chars<char16_t> = u" \f\n\r\t\v";
template<>
constexpr string_view<char32_t> space_chars<char32_t> = U" \f\n\r\t\v";
// list of all potential line endings that could be encountered
template<class char_type>
constexpr string_view<char_type> potential_line_endings[] = std::declval<string_view<char_type>[]>();
template<>
constexpr string_view<char> potential_line_endings<char>[] = {
"\r\n",
"\r",
"\n"
};
template<>
constexpr string_view<wchar_t> potential_line_endings<wchar_t>[] = {
L"\r\n",
L"\r",
L"\n"
};
template<>
constexpr string_view<char8_t> potential_line_endings<char8_t>[] = {
u8"\r\n",
u8"\r",
u8"\n"
};
template<>
constexpr string_view<char16_t> potential_line_endings<char16_t>[] = {
u"\r\n",
u"\r",
u"\n"
};
template<>
constexpr string_view<char32_t> potential_line_endings<char32_t>[] = {
U"\r\n",
U"\r",
U"\n"
};
// null-terminator for the different character types
template<class char_type>
constexpr char_type null_char = std::declval<char_type>();
template<>
constexpr char null_char<char> = '\0';
template<>
constexpr wchar_t null_char<wchar_t> = L'\0';
template<>
constexpr char8_t null_char<char8_t> = u8'\0';
template<>
constexpr char16_t null_char<char16_t> = u'\0';
template<>
constexpr char32_t null_char<char32_t> = U'\0';
// detects the line ending used within a string.
// e.g. detect_line_ending("foo\nbar\nbaz") -> "\n"
template<class char_type>
consteval string_view<char_type> detect_line_ending(string_view<char_type> str) {
return *std::ranges::max_element(
potential_line_endings<char_type>,
{},
[str](string_view<char_type> line_ending) {
// count the number of lines we would get with line_ending
auto view = std::views::split(str, line_ending);
return std::ranges::distance(view);
}
);
}
// returns a view to the leading sequence of space characters within a string
// e.g. get_leading_space_sequence(" \t foo") -> " \t "
template<class char_type>
consteval string_view<char_type> get_leading_space_sequence(string_view<char_type> line) {
return line.substr(0, line.find_first_not_of(space_chars<char_type>));
}
// checks if a line consists purely out of space characters
// e.g. is_line_empty(" \t") -> true
// is_line_empty(" foo") -> false
template<class char_type>
consteval bool is_line_empty(string_view<char_type> line) {
return get_leading_space_sequence(line).size() == line.size();
}
// splits a string into individual lines
// and removes the first & last line if they are empty
// e.g. split_lines("\na\nb\nc\n", "\n") -> {"a", "b", "c"}
template<class char_type>
consteval std::vector<string_view<char_type>> split_lines(
string_view<char_type> str,
string_view<char_type> line_ending
) {
std::vector<string_view<char_type>> lines;
for (auto line : std::views::split(str, line_ending)) {
lines.emplace_back(line.begin(), line.end());
}
// remove first/last lines in case they are completely empty
if(lines.size() > 1 && is_line_empty(lines[0])) {
lines.erase(lines.begin());
}
if(lines.size() > 1 && is_line_empty(lines[lines.size()-1])) {
lines.erase(lines.end()-1);
}
return lines;
}
// determines the longest possible sequence of space characters
// that we can remove from each line.
// e.g. determine_common_space_prefix_sequence({" \ta", " foo", " \t\ŧbar"}) -> " "
template<class char_type>
consteval string_view<char_type> determine_common_space_prefix_sequence(
std::vector<string_view<char_type>> const& lines
) {
std::vector<string_view<char_type>> space_sequences = {
string_view<char_type>{} // empty string
};
for(string_view<char_type> line : lines) {
string_view<char_type> spaces = get_leading_space_sequence(line);
for(std::size_t len = 1; len <= spaces.size(); len++) {
space_sequences.emplace_back(spaces.substr(0, len));
}
// remove duplicates
std::ranges::sort(space_sequences);
auto [first, last] = std::ranges::unique(space_sequences);
space_sequences.erase(first, last);
}
// only consider space prefix sequences that apply to all lines
auto shared_prefixes = std::views::filter(
space_sequences,
[&lines](string_view<char_type> prefix) {
return std::ranges::all_of(
lines,
[&prefix](string_view<char_type> line) {
return line.starts_with(prefix);
}
);
}
);
// select the longest possible space prefix sequence
return *std::ranges::max_element(
shared_prefixes,
{},
&string_view<char_type>::size
);
}
// unindents the individual lines of a raw string literal
// e.g. unindent_string(" \n a\n b\n c\n") -> "a\nb\nc"
template<class char_type>
consteval std::vector<char_type> unindent_string(string_view<char_type> str) {
string_view<char_type> line_ending = detect_line_ending(str);
std::vector<string_view<char_type>> lines = split_lines(str, line_ending);
string_view<char_type> common_space_sequence = determine_common_space_prefix_sequence(lines);
std::vector<char_type> new_string;
bool is_first = true;
for(auto line : lines) {
// append newline
if(is_first) {
is_first = false;
} else {
new_string.insert(new_string.end(), line_ending.begin(), line_ending.end());
}
// append unindented line
auto unindented = line.substr(common_space_sequence.size());
new_string.insert(new_string.end(), unindented.begin(), unindented.end());
}
// add null terminator
new_string.push_back(null_char<char_type>);
return new_string;
}
// returns the size required for the unindented string
template<class char_type>
consteval std::size_t unindent_string_size(string_view<char_type> str) {
return unindent_string(str).size();
}
// simple type that stores a raw string
// we need this to get around the limitation that string literals
// are not considered valid non-type template arguments.
template<class _char_type, std::size_t size>
struct string_wrapper {
using char_type = _char_type;
consteval string_wrapper(const char_type (&arr)[size]) {
std::ranges::copy(arr, str);
}
char_type str[size];
};
// used for sneakily creating and storing
// the unindented string in a template parameter.
template<string_wrapper sw>
struct unindented_string_wrapper {
using char_type = typename decltype(sw)::char_type;
static constexpr std::size_t buffer_size = unindent_string_size<char_type>(sw.str);
using array_ref = const char_type (&)[buffer_size];
consteval unindented_string_wrapper(int) {
auto newstr = unindent_string<char_type>(sw.str);
std::ranges::copy(newstr, buffer);
}
consteval array_ref get() const {
return buffer;
}
char_type buffer[buffer_size];
};
// uses a defaulted template argument that depends on the str
// to initialize the unindented string within a template parameter.
// this enables us to return a reference to the unindented string.
template<string_wrapper str, unindented_string_wrapper<str> unindented = 0>
consteval decltype(auto) do_unindent() {
return unindented.get();
}
// the actual user-defined string literal operator
template<string_wrapper str>
consteval decltype(auto) operator"" _M() {
return do_unindent<str>();
}
}
using multiline_raw_string::operator"" _M;
Upvotes: 7
Reputation: 16242
Yep, that is annoying. Perhaps there should be raw literals (R"PREFIX("
) and multiline raw literals (M"PREFIX
).
I came up with this alternative which almost describes itself:
#include<iterator> // std::next
...
{
...
...
std::string atoms_text =
std::next/*_line*/(R"XYZ(
O123 12.4830720891 13.1055820441 9.5288258996
O123 13.1055820441 13.1055820441 9.5288258996
)XYZ");
assert( atoms_text[0] != '\n' );
...
}
Limitations:
std::next
is constexpr
only from C++17, you then can use 1+(char const*)R"XYZ("
but it is not as clear and might produce warning.constexpr auto atom_text = 1 + (R"XYZ(
O123 12.4830720891 13.1055820441 9.5288258996
O123 13.1055820441 13.1055820441 9.5288258996
)XYZ");
Also, no warranties ;) . After all, I don't know if it is legal to do arithmetic with pointers to static data.
Another advantage of the + 1
approach is that it can be put at the end:
constexpr auto atom_text = R"XYZ(
O123 12.4830720891 13.1055820441 9.5288258996
O123 13.1055820441 13.1055820441 9.5288258996
)XYZ" + 1;
Possibilities are endless:
constexpr auto atom_text = &R"XYZ(
O123 12.4830720891 13.1055820441 9.5288258996
O123 13.1055820441 13.1055820441 9.5288258996
)XYZ"[1];
constexpr auto atom_text = &1[R"XYZ(
O123 12.4830720891 13.1055820441 9.5288258996
O123 13.1055820441 13.1055820441 9.5288258996
)XYZ"];
Upvotes: 1
Reputation: 106106
You can get a pointer to the 2nd character - skipping the leading newline - by adding 1 to the const char*
to which the string literal is automatically converted:
some_code();
std::string text = 1 + R"(
This is the first line.
This is the second line.
This is the third line.
)";
more_code();
IMHO, the above is flawed in breaking with the indentation of the surrounding code. Some languages provide a built-in or library function that does something like:
That allows usage like:
some_code();
std::string text = unindent(R"(
This is the first line.
This is the second line.
This is the third line.
)");
more_code();
Writing such a function is relatively simple...
std::string unindent(const char* p)
{
std::string result;
if (*p == '\n') ++p;
const char* p_leading = p;
while (std::isspace(*p) && *p != '\n')
++p;
size_t leading_len = p - p_leading;
while (*p)
{
result += *p;
if (*p++ == '\n')
{
for (size_t i = 0; i < leading_len; ++i)
if (p[i] != p_leading[i])
goto dont_skip_leading;
p += leading_len;
}
dont_skip_leading: ;
}
return result;
}
(The slightly weird p_leading[i]
approach is intended to make life for people who use tabs and spaces no harder than they make it for themselves ;-P, as long as the lines start with the same sequence.)
Upvotes: 33
Reputation: 1486
The accepted answer produces the warning cppcoreguidelines-pro-bounds-constant-array-index from clang-tidy
. See Pro.bounds: Bounds safety profile for details.
If you don't have std::span
but you're at least compiling with C++17 consider:
constexpr auto text = std::string_view(R"(
This is the first line.
This is the second line.
This is the third line.
)").substr(1);
The main advantages are readability (IMHO) and that you can turn on that clang-tidy warning in the rest of your code.
Using gcc
if someone does inadvertently reduce the raw string to an empty string you get a compiler error (demo) with this approach, while the accepted approach either produces nothing (demo) or depending on your compiler settings an "outside bounds of constant string" warning.
Upvotes: 4
Reputation: 841
I had the very same problem and I think the following solution is the best of all the above. I hope it'll be helpful for you, too (see example in the comment):
/**
* Strips a multi-line string's indentation prefix.
*
* Example:
* \code
* string s = R"(|line one
* |line two
* |line three
* |)"_multiline;
* std::cout << s;
* \endcode
*
* This prints three lines: @c "line one\nline two\nline three\n"
*
* @author Christian Parpart <[email protected]>
*/
inline std::string operator ""_multiline(const char* text, unsigned long size) {
if (!*text)
return {};
enum class State {
LineData,
SkipUntilPrefix,
};
constexpr char LF = '\n';
State state = State::LineData;
std::stringstream sstr;
char sep = *text++;
while (*text) {
switch (state) {
case State::LineData: {
if (*text == LF) {
state = State::SkipUntilPrefix;
sstr << *text++;
} else {
sstr << *text++;
}
break;
}
case State::SkipUntilPrefix: {
if (*text == sep) {
state = State::LineData;
text++;
} else {
text++;
}
break;
}
}
}
return sstr.str();
}
Upvotes: 1
Reputation: 137810
The closest I can see is:
std::string text = ""
R"(This is the first line.
This is the second line.
This is the third line.
)";
It would be a bit nicer if a whitespace was allowed in the delimiter sequence. Give or take the indentation:
std::string text = R"
(This is the first line.
This is the second line.
This is the third line.
)
";
My preprocessor will let you off with a warning about this, but unfortunately it's a bit useless. Clang and GCC get thrown off completely.
Upvotes: 4
Reputation: 17708
I recommend @Brian's answer, especially if you only need to have few lines of text, or that which you can handle with your text editor-fu. I have an alternative if that isn't the case.
std::string text =
"\
This is the first line." R"(
This is the second line.
This is the third line.)";
Raw string literals can still concatenate with "normal" string literals, as shown in the code. The "\
at the start is meant to "eliminate" the "
character from the first line, putting it in a line of its own instead.
Still, if I were to decide, I would put such lotsa-text into a separate file and load it at runtime. No pressure to you though :-).
Also, that is one of the uglier code I've written these days.
Upvotes: 4
Reputation: 119184
This is probably not what you want, but just in case, you should be aware of automatic string literal concatenation:
std::string text =
"This is the first line.\n"
"This is the second line.\n"
"This is the third line.\n";
Upvotes: 11