Reputation: 2178

Most memory efficient way of retrieving blob data from SQL Server

I am getting Out of Memory exceptions when retrieving large blob data from SQL Server. I am calling a stored procedure which returns 6 columns of simple data and 1 varbinary(max) data column.

I am using this code to execute the stored procedure:

m_DataReader = cmd.ExecuteReader(CommandBehavior.SequentialAccess);

and am ensuring that I read columns from the data reader in sequential column order.

See MSDN article on retrieving large data

For the varbinary(max) column, I am reading the data out like this:

DocBytes = m_DataReader.GetValue(i) as byte[];

What I have noticed is that at the point of Out of Memory, I appear to have 2 copies of the byte array in memory. One is in the DocBytes array, and the other is in a internal buffer of SqlDataReader.

Why is there a copy of this? I assumed I would be passing a reference, or is this due to the internal way that SqlDataReader provides the data - i.e. it always provides a copy?

Is there a more memory efficient way of reading the data from the database?

I have looked at the new .NET 4.5 GetStream method, but unfortunately, I do not have the ability to pass the stream on - I need the bytes in memory - so I cannot follow other examples of streaming into a file or web response. But I want to try and ensure that only one copy exists in memory at a time!

I have come to the conclusion that this is probably just the way it has to be and that the duplicate copy is just a buffer that has yet to be garbage collected. I really don't want to have to muck about with forcing garbage collection, and am hoping someone has some ideas about alternative approaches.

Upvotes: 3

Answers (5)

sx2008

Reputation: 21

The problem is that DbDataReader.GetStream() creates a MemoryStream and fills this stream with the data of the field. To avoid this I've created a extension method:

public static class DataReaderExtensions
{
    /// <summary>
    /// writes the content of the field into a stream
    /// </summary>
    /// <param name="reader"></param>
    /// <param name="ordinal"></param>
    /// <param name="stream"></param>
    /// <returns>number of written bytes</returns>
    public static long WriteToStream(this IDataReader reader, int ordinal, Stream stream)
    {
        if (stream == null)
            throw new ArgumentNullException("stream");

        if (reader.IsDBNull(ordinal))
            return 0;

        long num = 0L;
        byte[] array = new byte[8192];
        long bytes;
        do
        {
            bytes = reader.GetBytes(ordinal, num, array, 0, array.Length);
            stream.Write(array, 0, (int)bytes);
            num += bytes;
        }
        while (bytes > 0L);
        return num;
    }

    /// <summary>
    /// writes the content of the field into a stream
    /// </summary>
    /// <param name="reader"></param>
    /// <param name="field"></param>
    /// <param name="stream"></param>
    /// <returns>number of written bytes</returns>
    public static long WriteToStream(this IDataReader reader, string field, Stream stream)
    {
        int ordinal = reader.GetOrdinal(field);
        return WriteToStream(reader, ordinal, stream);
    }
}

Upvotes: 2

Stefan Steiger

Reputation: 82186

DocBytes = m_DataReader.GetValue(i) as byte[];

This will create a buffer of size DATA_LENGTH(column_name)
which will then be copied in full to your MemoryStream.
That's bad when DATA_LENGTH(column_name) is a large value.
You'll need to copy it to the memorystream via a buffer.

Also, if your file is THAT large, write it to a temporary file instead of storing it in full in a MemoryStream.

This is how I do it

    // http://stackoverflow.com/questions/2885335/clr-sql-assembly-get-the-bytestream
    // http://stackoverflow.com/questions/891617/how-to-read-a-image-by-idatareader
    // http://stackoverflow.com/questions/4103406/extracting-a-net-assembly-from-sql-server-2005
    public static void RetrieveFileStream(System.Data.IDbCommand cmd, string columnName, string path)
    {
        using (System.Data.IDataReader reader = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess | System.Data.CommandBehavior.CloseConnection))
        {
            bool hasRows = reader.Read();
            if (hasRows)
            {
                const int BUFFER_SIZE = 1024 * 1024 * 10; // 10 MB
                byte[] buffer = new byte[BUFFER_SIZE];

                int col = string.IsNullOrEmpty(columnName) ? 0 : reader.GetOrdinal(columnName);
                int bytesRead = 0;
                int offset = 0;

                // Write the byte stream out to disk
                using (System.IO.FileStream bytestream = new System.IO.FileStream(path, System.IO.FileMode.Create, System.IO.FileAccess.Write, System.IO.FileShare.None))
                {
                    while ((bytesRead = (int)reader.GetBytes(col, offset, buffer, 0, BUFFER_SIZE)) > 0)
                    {
                        bytestream.Write(buffer, 0, bytesRead);
                        offset += bytesRead;
                    } // Whend

                    bytestream.Close();
                } // End Using bytestream 

            } // End if (!hasRows)

            reader.Close();
        } // End Using reader

    } // End Function RetrieveFile

Adopting this code to write to memoryStream is simple.
Maybe you'll need to make the buffer size smaller or larger.

    public static System.IO.MemoryStream RetrieveMemoryStream(System.Data.IDbCommand cmd, string columnName, string path)
    {
        System.IO.MemoryStream ms = new System.IO.MemoryStream();

        using (System.Data.IDataReader reader = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess | System.Data.CommandBehavior.CloseConnection))
        {
            bool hasRows = reader.Read();
            if (hasRows)
            {
                const int BUFFER_SIZE = 1024 * 1024 * 10; // 10 MB
                byte[] buffer = new byte[BUFFER_SIZE];

                int col = string.IsNullOrEmpty(columnName) ? 0 : reader.GetOrdinal(columnName);
                int bytesRead = 0;
                int offset = 0;

                // Write the byte stream out to disk
                while ((bytesRead = (int)reader.GetBytes(col, offset, buffer, 0, BUFFER_SIZE)) > 0)
                {
                    ms.Write(buffer, 0, bytesRead);
                    offset += bytesRead;
                } // Whend

            } // End if (!hasRows)

            reader.Close();
        } // End Using reader

        return ms;
    } // End Function RetrieveFile

If you need to put it into Response.OutputStream, consider writing it there directly, not via MemoryStream.ToArray() + WriteBytes.

Upvotes: 0

usr

Reputation: 171178

Do you know the length of the data? In that case you can use the streaming approach to copy the data into a perfectly-sized byte[]. That will get rid of the double-buffering that seems to occur inside ADO.NET in the non-streaming case.

Upvotes: 0

Brian

Reputation: 3713

You have a choice when retrieving binary data from SQL. Assuming you're using varbinary (image is depricated) as your data type, you can either return all the data or you can return just some of the data using a simple substring function. If the binary is huge (like 1 gb), returning all of the data will be very memory intensive.

If that's the case, you have the option of taking a more iterative approach to returning the data. Let's say it's a 1 gb binary, you can have the program cycle through the data in 100mb chunks, writing each chunk to disk, then discarding the buffer, before returning for the next 100mb-chunk.

To get the first chunk you'd use:

Declare @ChunkCounter as integer
Declare @Data as varbinary(max)
Declare @ChunkSize as integer = 10000000
Declare @bytes as integer
Select @bytes = datalength(YourField) from YourTable where ID = YourID
If @bytes> @ChunkSize 
      Begin 
           /* use substring to get the first chunksize   */ 
           Select @data= substring(YourField,0,@ChunkSize), @Chunkcounter +1 as 'ChunkCounter'
           FROM YourTable   
           where ID = YourID
      End 
Else
      Begin ....

Upvotes: 1

Darin Dimitrov

Reputation: 1038830

I have looked at the new .NET 4.5 GetStream method, but unfortunately, I do not have the ability to pass the stream on - I need the bytes in memory

So all you have to do is read from this stream into a byte array.

Alternatively you could try reading it in small chunks from the reader using the GetBytes method as shown here: https://stackoverflow.com/a/625485/29407

Upvotes: 1

Most memory efficient way of retrieving blob data from SQL Server

Answers (5)

Related Questions