Reputation: 3523
I'm trying to reimplement dos2unix
and unix2dos
in C++. Here's my dos2unix
:
#include <stdio.h>
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
// save as d2u.cpp, compile '$ g++ d2u.cpp -o d2u'
// execute '$ ./d2u sample.txt'
int main(int argc, char** argv) {
string fn ="";
char c;
if (argc == 2) { fn = argv[1]; }
ifstream is(fn.c_str());
ofstream os("temp.txt");
while (is >> c) {
switch(c) {
// 0x0D = '\r', 0x0A = '\n'
case 0x0D: break;
case 0x0A: os << (char)0x0A; break;
default: os << c; break;
}
}
is.close(); os.close();
string command = "mv temp.txt " + fn;
system(command.c_str());
return EXIT_SUCCESS;
}
Since DOS text files will have newlines ending in \r\n
, I want to ignore the \r
and only output \n
to the new file. Testing it with a text file and comparing the hexdumps, however, shows nothing is done except all \r
and \n
are removed:
74 65 73 74 0d 0a 74 65 73 74 32 0d 0a 74 65 73 74 33
t e s t \r \n t e s t 2 \r \n t e s t 3
74 65 73 74 74 65 73 74 32 74 65 73 74 33
t e s t t e s t 2 t e s t 3
74 65 73 74 0a 74 65 73 74 32 0a 74 65 73 74 33
t e s t \n t e s t 2 \n t e s t 3
Why does this happen? I get similar behavior with my implementation of unix2dos
.
Upvotes: 0
Views: 658
Reputation: 84589
To avoid having >>
eliminate whitespace from your input, the easiest change is simply to use is.get(c)
instead of is >> c
. std::basic_istream::get behaves as an Unformatted input function and will provide a character-by-character read of everything in the file. The std::basic_iostream operator >>
provides for Formatted input which eliminates whitespace.
Changing to istream.get()
provides the behavior you describe,
#include <iostream>
#include <fstream>
#include <string>
int main(int argc, char** argv) {
std::string fn {};
char c;
if (argc < 2) { /* validate filename provided */
std::cerr << "error: filename required.\n";
return 1;
}
fn = argv[1];
std::ifstream is (fn.c_str());
std::ofstream os ("temp.txt");
while (is.get(c))
if (c != '\r')
os.put(c);
string command = "mv temp.txt " + fn;
system(command.c_str());
}
Example Input File
$ cat dat/fleas2line.txt
my dog has fleas
my cat has none
Example Use/Output File
You can see the '\n'
is preserved in your input.
$ hexdump -Cv temp.txt
00000000 6d 79 20 64 6f 67 20 68 61 73 20 66 6c 65 61 73 |my dog has fleas|
00000010 0a 6d 79 20 63 61 74 20 68 61 73 20 6e 6f 6e 65 |.my cat has none|
00000020 0a |.|
temp.txt
$ cat temp.txt
my dog has fleas
my cat has none
Lastly, avoid using 0XD
and 0XA
in your code and instead use the characters themselves, e.g. '\r'
and '\n'
. It makes the code much more readable. Look things over and let me know if you have further questions.
Upvotes: 2