Reputation: 91
I have one question. I'm writing some code in C, on UNIX. I need to write a special character in a file, because I need to divide my file in small sections.
Example:
'SPECIAL_CHARACTER'
section 1 with some text
'SPECIAL_CHARACTER'
section 2 with some text
etc..
I was thinking to use character '\1'.It seems to work, but it is ok? Or It is wrong?
To do these things without using characters like "\0" or "\n" what should I do?
Upvotes: 0
Views: 1010
Reputation: 1702
I hear two different questions where you ask "Or It is wrong?"
I hear you asking "how can I designate a separator byte in my code?", and I hear you asking "what is a good choice for a separator byte?"
First, fundamentally, what you are asking about is covered in section 6.4.4.4 of the C language specification, which covers "C Character Constants". There are various places you can look up the formal C language spec, or you can search for "C Character Constants" for perhaps a friendlier description, etc.
In detail, a handful of letters can be used in escape sequences to stand in for single bytes of specific values; e.g., \n
is one of those, as a stand-in for 0x0a
(decimal 10), a byte designated (in ASCII) as a newline. Here are the legal ones:
\a \b \f \n \r \t \v
The escape sequences \0
and \1
work because C supports using \
followed by digits as an octal value. So, that'll also work with, say, \3
and \35
, but not \9
, and note that \35
has a decimal value of 29. (Google "octal values" if you don't immediately see why that's the case.)
There are other legal escape sequences:
\' \" \\ \? : ' " \ and ?, respectively
\xNNNN... : each 'N' can be a hexadecimal digit
And, of course, escape sequences are just one aspect of C character constants.
Second, whether or not you should use a given byte value as your file's section separator depends entirely on how your program will be used. As others have pointed out in the comments, there are commonplace prevailing practices on what sort of byte value to use for this sort of thing.
I personally agree that 0x1e
makes perhaps the most sense since in ASCII it is the "record separator". Conforming to ASCII can matter if the data will need to be understood by other programs, or if your program will need to be understood by other people.
On the other hand, a simple code comment can make it clear to anyone reading your code what byte value you are using for separating sections of your data file, and any program that needs to understand your data files needs to 'know' a lot more about the file format than just what the record separator is. There is nothing magical about 0x1e
: it is merely a convention, and a reserved spot on the ASCII table to facilitate a common need -- that is, record separation of text that could contain normal text separators like space, newline, and null.
Broadly, any byte value that won't show up in the contents of your sections would make a fine section separator. Since you say those contents will be text, there are well over 100 choices, even if you exclude \0
(0x00
) and \n
(0x0a
). In ASCII, a handful of byte values have been set aside for this sort of purpose, so that helps reduce the choice from several dozen to just several. Even among those several, there are only a few commonly used as separators.
Upvotes: 2