gator
gator

Reputation: 3523

Reimplementing dos2unix and unix2dos in C++; '\r' and '\n' not appearing in hexdump?

I'm trying to reimplement dos2unix and unix2dos in C++. Here's my dos2unix:

dos2unix

#include <stdio.h>
#include <fstream>
#include <iostream>
#include <string>

using namespace std;

// save as d2u.cpp, compile '$ g++ d2u.cpp -o d2u'
// execute '$ ./d2u sample.txt'
int main(int argc, char** argv) {
    string fn ="";
    char c;
    if (argc == 2) { fn = argv[1]; }
    ifstream is(fn.c_str());
    ofstream os("temp.txt");
    while (is >> c) {
        switch(c) {
            // 0x0D = '\r', 0x0A = '\n'
            case 0x0D: break;
            case 0x0A: os << (char)0x0A; break;
            default: os << c; break;
        }
    }
    is.close(); os.close();
    string command = "mv temp.txt " + fn;
    system(command.c_str());
    return EXIT_SUCCESS;
}

Since DOS text files will have newlines ending in \r\n, I want to ignore the \r and only output \n to the new file. Testing it with a text file and comparing the hexdumps, however, shows nothing is done except all \r and \n are removed:

Hexdump of input

74 65 73 74 0d 0a 74 65 73 74 32 0d 0a 74 65 73 74 33
t  e  s  t  \r \n t  e  s  t  2  \r \n t  e  s  t  3

Hexdump of output

74 65 73 74 74 65 73 74 32 74 65 73 74 33
t  e  s  t  t  e  s  t  2  t  e  s  t  3

Hexdump of expected output

74 65 73 74 0a 74 65 73 74 32 0a 74 65 73 74 33
t  e  s  t  \n t  e  s  t  2  \n t  e  s  t  3

Why does this happen? I get similar behavior with my implementation of unix2dos.

Upvotes: 0

Views: 658

Answers (1)

David C. Rankin
David C. Rankin

Reputation: 84589

To avoid having >> eliminate whitespace from your input, the easiest change is simply to use is.get(c) instead of is >> c. std::basic_istream::get behaves as an Unformatted input function and will provide a character-by-character read of everything in the file. The std::basic_iostream operator >> provides for Formatted input which eliminates whitespace.

Changing to istream.get() provides the behavior you describe,

#include <iostream>
#include <fstream>
#include <string>

int main(int argc, char** argv) {

    std::string fn {};
    char c;

    if (argc < 2) { /* validate filename provided */
        std::cerr << "error: filename required.\n";
        return 1;
    }

    fn = argv[1];

    std::ifstream is (fn.c_str());
    std::ofstream os ("temp.txt");

    while (is.get(c))
        if (c != '\r')
            os.put(c); 

    string command = "mv temp.txt " + fn;
    system(command.c_str());

}

Example Input File

$ cat dat/fleas2line.txt
my dog has fleas
my cat has none

Example Use/Output File

You can see the '\n' is preserved in your input.

$ hexdump -Cv temp.txt
00000000  6d 79 20 64 6f 67 20 68  61 73 20 66 6c 65 61 73  |my dog has fleas|
00000010  0a 6d 79 20 63 61 74 20  68 61 73 20 6e 6f 6e 65  |.my cat has none|
00000020  0a                                                |.|

temp.txt

$ cat temp.txt
my dog has fleas
my cat has none

Lastly, avoid using 0XD and 0XA in your code and instead use the characters themselves, e.g. '\r' and '\n'. It makes the code much more readable. Look things over and let me know if you have further questions.

Upvotes: 2

Related Questions