Amirreza A.
Amirreza A.

Reputation: 761

What is the proper format of writing raw strings with '$' in C++?

I'm learning about raw strings in C++ from a cplusplus.com tutorial on constants. Based on the definition on that site, a raw string should start with R"sequence( and end with )sequence where sequence can be any sequence of characters.

One of the examples of the website is the following:

R"&%$(string with \backslash)&%$"

However, when I try to compile the code that contains the above raw string, I get a compilation error.

test.cpp:5:28: error: invalid character '$' in raw string delimiter
    5 |     std::string str = R"&%$(string with \backslash)&%$";
      |                       ^
test.cpp:5:23: error: stray 'R' in program

I tried it with g++ and clang++ on both Windows and Linux. None of them worked.

Upvotes: 29

Views: 2722

Answers (4)

According to CPP Reference, C++26 will add three more characters to the basic character set:

The following characters are added to the basic character set since C++26:

Code unit Character Glyph
U+0024 Dollar Sign $
U+0040 Commercial At @
U+0060 Grave Accent `

Since the rules about raw strings delimiters allow characters from the basic character set, without making an exception for the dollar sign $, using it will be allowed.

d-char-seq - A sequence of one or more d-char s, at most 16 characters long
d-char - A character from the basic character set, except parentheses, backslash and spaces

Upvotes: 1

Rohith V
Rohith V

Reputation: 1123

Just remove $ like the code below:

string string3 = R"&%(string with \backslash)&%";

$ gives an error because the basic source character set does not have $ as said in the comments.

  1. The individual bytes of the source code file are mapped (in implementation-defined manner) to the characters of the basic source character set. In particular, OS-dependent end-of-line indicators are replaced by newline characters. The basic source character set consists of 96 characters:

a) 5 whitespace characters (space, horizontal tab, vertical tab, form feed, new-line)

b) 10 digit characters from '0' to '9'

c) 52 letters from 'a' to 'z' and from 'A' to 'Z'

d) 29 punctuation characters: _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ' 2) Any source file character that cannot be mapped to a character in the basic source character set is replaced by its universal character name (escaped with \u or \U) or by some implementation-defined form that is handled equivalently.

Reference

Upvotes: 1

ph3rin
ph3rin

Reputation: 4896

From C++ reference:

delimiter: A character sequence made of any source character but parentheses, backslash and spaces (can be empty, and at most 16 characters long)

Note the "any source character" part here.

Let us look at what the standard says:

From [gram.lex]:

raw-string:
  "d-char-sequenceopt(r-char-sequenceopt)d-char-sequenceopt"

...

d-char-sequence:
  d-char
  d-char-sequence d-char

d-char:
  any member of the basic source character set except: space, the left parenthesis (, the right parenthesis ), the backslash \, and the control characters representing horizontal tab, vertical tab, form feed, and newline.

Well, what is the basic source character set? From [lex.charset]:

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:

a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & |~! = , \ " ’

... which does not include $; so the conclusion is that the dollar sign $ cannot be part of the delimiter sequence.

Upvotes: 25

heap underrun
heap underrun

Reputation: 2489

For the basic source character set, see lex.charset 5.3 (1): that set does not contain the $ character. For the allowed prefix characters in raw string literals, see lex.string 5.13.5: "/…/ any member of the basic source character set except: space, the left parenthesis (, the right parenthesis ), the backslash \, and the control characters representing horizontal tab, vertical tab, form feed, and newline." (emphasis mine).

Upvotes: 4

Related Questions