Luiey
Luiey

Reputation: 873

Exception when reading million characters in JSON file [OutOfMemoryException]

I have downloaded a JSON file recorded from Azure Blob Storage. The file size is 137MB.

The character and line properties when open using Notepad++ as per below image: enter image description here

It takes around 1-2 seconds when I choose "Edit with Notepad++" from file context menu. So, I'm decided to create a program to make a JSON converter to CSV file format. But seems, I have faced some kind of exception. Currently, for viewing the JSON content, I will show in RichTextBox as it can view the content before I decide to convert to CSV.

Event to start load:-

private async void txtjsonname_DoubleClick(object sender, EventArgs e)
{
    OpenFileDialog ofd = new OpenFileDialog();
    ofd.Filter = "JSON Files (*.json)|*.json";
    ofd.InitialDirectory = @"C:\";
    ofd.Title = "Select single json file to be converted";
    ofd.Multiselect = false;
    if (ofd.ShowDialog() == DialogResult.OK)
    {
        rtbstat.Text = null;
        txtcsvname.Text = null;
        txtjsonname.Text = null;
        lblcsvpath.Text = null;
        lbljsonpath.Text = null;
        rtbjson.Clear();
        txtjsonname.Text = Path.GetFileName(ofd.FileName);
        lbljsonpath.Text = Path.GetDirectoryName(ofd.FileName);

        if (await LoadJSONtoRTB(ofd.FileName))
        {
            rtbjson.WordWrap = false;
            rtbstat.Text = "Load file finished! " + (rtbjson.Lines.Count()).ToString() + " line(s) detected | " + rtbjson.Text.Length.ToString() + " character(s) detected";
            txtcsvname.Text = Path.GetFileNameWithoutExtension(ofd.FileName) + ".csv";
        }
    }
    await Task.Delay(1000);
}

The code that I have try and face exception:-

First Approach: First code:

private async Task<bool> LoadJSONtoRTB(string path)
    {
        try
        {
            foreach (var line in File.ReadLines(path))
            {
                rtbjson.Text = line;
            }
            await Task.Delay(10);
            return true;
        }
        catch (Exception)
        {
            return false;
        }
    }

Second code:

private async Task<bool> LoadJSONtoRTB(string path)
    {
        try
        {
            using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            using (BufferedStream bs = new BufferedStream(fs))
            using (StreamReader sr = new StreamReader(bs))
            {
                string line;
                while ((line = sr.ReadLine()) != null)
                {
                    rtbjson.AppendText(line);
                }
            }
            await Task.Delay(10);
            return true;
        }
        catch (Exception)
        {
            return false;
        }
    }

enter image description here Exception: An unhandled exception of type 'System.AccessViolationException' occurred in System.Windows.Forms.dll

Additional information: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

Second Approach:-

private async Task<bool> LoadJSONtoRTB(string path)
{
    try
    {
        StreamReader sr = new StreamReader(@path);
        while (!sr.EndOfStream)
            rtbjson.Text += sr.ReadLine();
        await Task.Delay(10);
        return true;
    }
    catch (Exception)
    {
        return false;
    }
}

Using above code, it running about 12 minutes at when I'm put the breakpoint to see the progress. enter image description here

12 minutes with 6million length read.

Is there any ways to show the text file (json/txt) with example of 64 millions characters length like notepad++ only takes 1-2 seconds to view the file?

Upvotes: 4

Views: 435

Answers (2)

lobiZoli
lobiZoli

Reputation: 199

Your LoadJSONtoRTB method runs asynchronously. Hence you're trying to update the gui (textbox) from the wrong thread. This approach will help you run the gui update on the right thread:

this.Invoke(new Action(() => { rtbjson.Text += sr.ReadLine(); }));

Of course there are more efficient methods populating the control with large amount of text like StringBuilder. The important take away is to always update the gui on the gui thread. And that can be done by running Form.Invoke

Upvotes: 1

Matteo Umili
Matteo Umili

Reputation: 4017

I suspect that Notepad++ loads the entire file in memory with something equivalent to System.IO.File.ReadAllText. Also, there is no benefit in appending every row of the file to the string, the final result is the same memory occupied. With RichTextBox the best you can do is:

richTextBox1.Text = System.IO.File.ReadAllText(filePath);

Anyway, Notepad++ uses Scintilla that is faster than RichTextBox.

You can try using ScintillaNET that is a wrapper over Scintilla.

You can set the control text in the same way you do with RichTextBox:

scintilla1.Text = System.IO.File.ReadAllText(filePath);

Upvotes: 2

Related Questions