marseilles84
marseilles84

Reputation: 426

c# hex byte 0x09 (ascii -> tab) to "\t" string

I need to convert a byte array of a text file to it's string character representation.

For example, if I have a text file that has:

hello (tab) there (newline) friend

I would like to convert that to an array:

my_array  = {'h', 'e' ,'l','l','o', '\t', 't', 'h','e','r','e', '\r','\n', 'f', 'r' ,'i','e','n', 'd'};

I'm having trouble with converting the control characters to their escaped strings, i.e.:

  • 0x09 = '\t';
  • 0x0D = '\r';
  • 0x0A = '\n';

I have tried this, but the tabs and new lines aren't represented here:

byte[] text_bytes = File.ReadAllBytes("ok.txt");
char[] y = Encoding.ASCII.GetChars(text_bytes);

I know I can just loop through each byte and have a condition to look for 0x09 and if I find it, then replace with "\t", but I'm wondering if there is something built in.

Upvotes: 3

Views: 23797

Answers (4)

keyboardP
keyboardP

Reputation: 69372

If you want to escape all control characters then you can use Regex.Escape.

string myText = File.ReadAllLines("ok.txt");

//to optimize, you could remove characters that you know won't be there (e.g. \a)
Regex rx = new Regex(@"[\a\e\f\n\r\t\v]", RegexOptions.Compiled); 

myText = rx.Replace(myText, m =>  { return Regex.Escape(m.Value); });

Console.WriteLine(myText);

You can't convert it to a char array in the way you've posted because an escaped control character would count as two characters (\ and t). But if you don't mind each character being separate, you can simply do

char[] myCharArray = myText.ToCharArray();

Upvotes: 1

Jim Mischel
Jim Mischel

Reputation: 134005

There are several ways you could do it. The simplest would be to load the entire file into memory:

string theText = File.ReadAllText(filename);

Then use string.Replace to replace the items you're interested in:

// "escaping" the '\t' with '\\t' makes it write the literal characters '\' and 't'
theText = theText.Replace("\t", "\\t");

theText = theText.Replace("\r", "\\r");
theText = theText.Replace("\n", "\\n");

Then you can create your array of characters. If you're sure that it's all ASCII text, you can use Encoding.ASCII:

byte[] theChars = Encoding.ASCII.GetBytes(theText);

Or, if you want a character array:

char[] theChars = theText.ToCharArray();

That's probably going to be fast enough for your purposes. You might be able to speed it up by making a single pass through the string, reading character by character and copying to a StringBuilder:

StringBuilder sb = new StringBuilder(theText.Length);
foreach (char c in theText)
{
    switch (c)
    {
        case '\t' : sb.Append("\\t"); break;
        case '\r' : sb.Append("\\r"); break;
        case '\n' : sb.Append("\\n"); break;
        default : sb.Append(c); break;
    }
}

byte[] theChars = Encoding.ASCII.GetBytes(sb.ToString());

Upvotes: 2

Matthew Watson
Matthew Watson

Reputation: 109587

If you don't mind it being somewhat slower than a hand-rolled solution, then you could use a CodeDomProvider (which would probably be fast enough).

I found sample code here: http://code.google.com/p/nbehave-cf/source/browse/trunk/CustomTool/StringExtensions.cs?spec=svn5&r=5

using System;
using System.CodeDom;
using System.CodeDom.Compiler;
using System.IO;

namespace CustomTool
{
    public static class StringExtensions
    {
        public static String ToLiteral(this String input)
        {
            using (var writer = new StringWriter())
            {
                using (var provider = CodeDomProvider.CreateProvider("CSharp"))
                {
                    provider.GenerateCodeFromExpression(new CodePrimitiveExpression(input), writer, null);
                    return writer.ToString();
                }
            }
        }
    }
}

You would use it by reading the string using Encoding.Ascii.ReadString(), and then use .ToLiteral() to convert it to a string, then .ToCharArray() to get the final result.

This gives the correct result with, for example:

// You would do (using your sample code):
// string test = Encoding.ASCII.GetString(text_bytes);

string test = "hello\tthere\nfriend";

char[] result = test.ToLiteral().ToCharArray();

If you inspect result you will see that it has the correct characters.

However, I'd just use a loop and a switch statement to convert the characters. It's easy to write and understand, and it'd be much more efficient.

Upvotes: 0

BradleyDotNET
BradleyDotNET

Reputation: 61349

In the "y" array, the "escaped characters" will have their actual values (0x09, 0x0D, etc.) with an unprintable character as the "text".

When you write \t, \n, \r, etc. you could have written (char)0x09, (char)0x0D and this is what the data gets written as. In other words the "\t" character doesn't exist!

Whether you roll your own, or use an existing library, someone is going to have to map 0x09 to the "\t" escape sequence and inject it into your string.

Upvotes: 0

Related Questions