J.S.Orris
J.S.Orris

Reputation: 4821

Replace tabs ("\t") in flat file with "Unit Separator" (0x1f) in C#

I have been having trouble finding the metacharacter for the 'Unit Separator' to replace the tabs in a flat file.

So far I have this:

File.WriteAllLines(outputFile,
    File.ReadLines(inputFile)
    .Select(t => t.Replace("\t", "\0x1f")));  //this does not work

I have also tried:

File.WriteAllLines(outputFile,
    File.ReadLines(inputFile)
    .Select(t => t.Replace("\t", "\u"))); //also doesn't work

AND

File.WriteAllLines(outputFile,
    File.ReadLines(inputFile)
    .Select(t => t.Replace("\t", 0x1f)));  //also doesn't work

How do I correctly use hex as a parameter? Also, what is the metacharacter for the 'Unit Separator"?

Upvotes: 1

Views: 4961

Answers (3)

ashes999
ashes999

Reputation: 10163

I think the correct way to encode unicode characters in C# is to use the \unnnn format. You can try replacing it with the string \u001f, like so:

File.WriteAllLines(outputFile,
    File.ReadLines(inputFile)
    .Select(t => t.Replace("\t", "\001f")));

Does that work?

Upvotes: 0

psoshmo
psoshmo

Reputation: 1550

the metacharacter for the unit separator is

U+001f

you should be able to use it like

File.WriteAllLines(outputFile,
File.ReadLines(inputFile)
.Select(t => t.Replace("\t", "\u001f")));

EDIT: Since a discussion about control characters started to happen, Ill add this definition for posterity's sake.

A special, non-printing character that begins, modifies, or ends a function, event, operation or control operation. The ASCII character set defines 32 control characters. Originally, these codes were designed to control teletype machines. Now, however, they are often used to control display monitors, printers, and other modern devices.

from here.

also, here is a description of the unit separator

The smallest data items to be stored in a database are called units in the ASCII definition. We would call them field now. The unit separator separates these fields in a serial data storage environment. Most current database implementations require that fields of most types have a fixed length. Enough space in the record is allocated to store the largest possible member of each field, even if this is not necessary in most cases. This costs a large amount of space in many situations. The US control code allows all fields to have a variable length. If data storage space is limited—as in the sixties—this is a good way to preserve valuable space. On the other hand is serial storage far less efficient than the table driven RAM and disk implementations of modern times. I can't imagine a situation where modern SQL databases are run with the data stored on paper tape or magnetic reels...

from here.

Upvotes: 4

Martin Noreke
Martin Noreke

Reputation: 4136

This should get you where you need to be:

        char unitSeperatorChar = (char)Convert.ToInt32("0x1f", 16);
        string contents = File.ReadAllText(inputFile);
        string convertedContents = contents.Replace('\t', unitSeperatorChar);
        File.WriteAllText(outputFile, convertedContents);

I loaded into a string, converted, and re-saved. You can combine them for better memory efficiency in string management.

Upvotes: 0

Related Questions