SYL
SYL

Reputation: 337

read unicode string from text file in UWP app

in Windows 10 app I try to read string from .txt file and set the text to RichEditBox:

Code variant 1:

var read = await FileIO.ReadTextAsync(file, Windows.Storage.Streams.UnicodeEncoding.Utf8);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, read);

Code variant 2:

var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.ReadWrite);
ulong size = stream.Size;
using (var inputStream = stream.GetInputStreamAt(0))
{
    using (var dataReader = new Windows.Storage.Streams.DataReader(inputStream))
    {
        dataReader.UnicodeEncoding = Windows.Storage.Streams.UnicodeEncoding.Utf8;
        uint numBytesLoaded = await dataReader.LoadAsync((uint)size);
        string text = dataReader.ReadString(numBytesLoaded);
        txt.Document.SetText(Windows.UI.Text.TextSetOptions.FormatRtf, text);
    }
}

On some files I have this error - "No mapping for the Unicode character exists in the target multi-byte code page"

I found one solution:

IBuffer buffer = await FileIO.ReadBufferAsync(file);
DataReader reader = DataReader.FromBuffer(buffer);
byte[] fileContent = new byte[reader.UnconsumedBufferLength];
reader.ReadBytes(fileContent);
string text = Encoding.UTF8.GetString(fileContent, 0, fileContent.Length);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, text);

But with this code the text looks like question marks in rhombus.

How I can read and display same text files in normal encoding?

Upvotes: 0

Views: 3341

Answers (3)

Rajesh Barfa
Rajesh Barfa

Reputation: 126

        StorageFile file = await StorageFile.GetFileFromApplicationUriAsync(new Uri("ms-appx:///Assets/FontFiles/" + fileName));
        using (var inputStream = await file.OpenReadAsync())
        using (var classicStream = inputStream.AsStreamForRead())
        using (var streamReader = new StreamReader(classicStream))
        {
            while (streamReader.Peek() >= 0)
            {
                line = streamReader.ReadLine();
           }
       }

Upvotes: 0

SYL
SYL

Reputation: 337

Solution:

1) I made a port of Mozilla Universal Charset Detector to UWP (added to Nuget)

ICharsetDetector cdet = new CharsetDetector();
cdet.Feed(fileContent, 0, fileContent.Length);
cdet.DataEnd();

2) Nuget library Portable.Text.Encoding

if (cdet.Charset != null)
string text = Portable.Text.Encoding.GetEncoding(cdet.Charset).GetString(fileContent, 0, fileContent.Length);

That's all. Now unicode ecnodings (include cp1251, cp1252) - works good ))

Upvotes: 0

Gian Paolo
Gian Paolo

Reputation: 66

Challenge here is the encoding and it depends how much accuracy you need for your application. If you need something fast and simple you can adapt this answer

    public static Encoding GetEncoding(byte[4] bom)
    {
        // Analyze the BOM
        if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
        if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
        if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
        if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
        if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return Encoding.UTF32;
        return Encoding.ASCII;
    }

    async System.Threading.Tasks.Task MyMethod()
    {
        FileOpenPicker openPicker = new FileOpenPicker();
        StorageFile file = await openPicker.PickSingleFileAsync();
        IBuffer buffer = await FileIO.ReadBufferAsync(file);
        DataReader reader = DataReader.FromBuffer(buffer);
        byte[] fileContent = new byte[reader.UnconsumedBufferLength];
        reader.ReadBytes(fileContent);
        string text = GetEncoding(new byte[4] {fileContent[0], fileContent[1], fileContent[2], fileContent[3] }).GetString(fileContent);
        txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, text);

        //.. 
    }

If you need something more accurate you should think to port to UWP a porting to .Net of Mozilla charset detector as already mentioned in this answer

Please note that the code above is just a sample it is missing all the using statements for types implementing IDisposable and it also should have been wrote in a more consistent way

hth -g

Upvotes: 3

Related Questions