Josh C
Josh C

Reputation: 1111

Writing to a File, Separating Values when Reading

I am trying to teach myself to program, and one early subject I'm having trouble wrapping my head around is file IO.

So far I know I can save data separated by commas, in order to successfully read that data later. Doing something like the following:

int x[] = {1,22,333,4444,55555};
std::ofstream FileWriter;
std::string dataName = "One through Five";
for( int i = 0; i < 5; ++i)
{
    FileWriter << x[i] << ',';
}
FileWriter << dataName << std::endl;
FileWriter.close();

That is certainly easy enough, but that seems tacky and actually kind of inefficient. I wonder if there is a better way to save data and still have it separable when I go to read it later.

Unfortunately the searching I have done has only yielded the following:

std::getline( FileReader, myStringBuffer, ',');

So is it possible that I can store separate data points, such that they are distinguishable as separate at read time, without the use of a delimiting character at save time? That is to say, without separating data with some character or white-space.

Upvotes: 1

Views: 1056

Answers (3)

Darren
Darren

Reputation: 253

Basic binary file io practices will do what you want. When reading binary data, you need to know the size of each piece of data you are reading; knowing this will drive a lot of your design decisions. If you want to read in an int followed by a string, you need to know the size of an int (easily found by calling sizeof(int)) so that you can 'nibble off' an int's worth of data from the binary blob you are trying to load data from, then after you read the int's worth of data you need to know how big your string is. Since strings are variable in length, you either need to assume a standard length (yuck!), or read the length first FROM A KNOWN SIZE OF DATA, then read that many bytes into a string. So, your data writer needs to write out the length of strings (or any other variable-sized data type) before it writes out a string, as a known data size (like say write the size information as an unsigned int).

For clever ways to organize binary data for reading/writing, check out the Interchange File Format (IFF). It is what TIFF, RIFF etc are all based on, and is a great way to design a binary file blob. It basically stores data as "chunks", where the data first has a chunk ID written out, then the size of the chunk in bytes, then the data for that chunk. This way, a reader program can check chunk ID's, and if it doesn't want to/know how to handle a certain or even unknown type of data, it can skip ahead the chunk size in bytes and read the next chunk.

Upvotes: 1

jrd1
jrd1

Reputation: 10726

If I understand your question correctly, you wish to know how to store data in a format that you can read again (presumably in another C++ program).

If so, then there are a number of ways in which you can do this:

The most common (and simplest) way of doing so is via:

  1. Whitespace separated (e.g.): value1 value2 value3
  2. Commas (this will generate a comma-separated file, more commonly known as a CSV file) 1,2,3.
  3. Or, any character at all: 1#2#3

That way you can use std::getline like you did (e.g. for CSV):

char delim = ',';

while(std::getline(input_stream, temporary_string, delim) {
    //data handling goes here...
}

Of course, as this is a naive example (i.e. your data is structured as a table), you'll have to adapt your code to address more multifaceted data that spans more than one line often by reading in the data in chunks and parsing those chunks based on your format.

Complex Example (Satellite Coordinates):

1.1 1.2 1.3 1.4 1.5
1.6   1.7 1.8 1.9
2.0        
2.0
2.1 2.2 2.3 2.4 2.5
2.1     2.4

Which is whitespace delimited and has the following format:

  1. Every data point is stored in the following pattern: data, space.
  2. If a data point doesn't exist, it is represented by a space, unless it is the last non-existent data point where all other output is truncated to a newline.

Upvotes: 1

Dietmar K&#252;hl
Dietmar K&#252;hl

Reputation: 154015

Separating values by specific characters can work, depending you values: if you strings don't use the separating character, e.g., a comma, you can save the values using a comma a separator. Things become interesting when there is no character which is known to be a useful separate. The typical approach in that case is to use quoting together with suitable escape character, e.g., what C and C++ use to specific string literals:

  • Normally strings start and with a single quote ".
  • To embed a quote into a string it is escaped with by a backslash, e.g., "\"".
  • To embed the escape character backslash, it uses two backslashes, e.g., "\\".

Another approach which is sometimes used is to combine the values with a size prefix. What is used depends on the exact needs, though.

When you use comma as a separator, you probably need something to skip commas when reading formatted values, e.g., integers: they won't easily read over a comma and just ignoring it is probably not appropriate. If it is missing it is clearly a format error. You might want to use a manipulator to extra a comma when present:

std::istream& comma(std::istream& in) {
    std::istream::sentry cerberos(in);
    if (in && in.peek() == ',') {
        in.ignore();
    }
    else {
        in.setstate(std::ios_base::failbit);
    }
    return in;
}
// ...
int i, j;
if (in >> i >> comma >> j) { ... }

The input expression should read two comma separated integers and fail if either value isn't an int or they are not separated by a comma.

Upvotes: 1

Related Questions