Reputation: 2178
I am getting Out of Memory exceptions when retrieving large blob data from SQL Server. I am calling a stored procedure which returns 6 columns of simple data and 1 varbinary(max)
data column.
I am using this code to execute the stored procedure:
m_DataReader = cmd.ExecuteReader(CommandBehavior.SequentialAccess);
and am ensuring that I read columns from the data reader in sequential column order.
See MSDN article on retrieving large data
For the varbinary(max)
column, I am reading the data out like this:
DocBytes = m_DataReader.GetValue(i) as byte[];
What I have noticed is that at the point of Out of Memory, I appear to have 2 copies of the byte array in memory. One is in the DocBytes
array, and the other is in a internal buffer of SqlDataReader
.
Why is there a copy of this? I assumed I would be passing a reference, or is this due to the internal way that SqlDataReader
provides the data - i.e. it always provides a copy?
Is there a more memory efficient way of reading the data from the database?
I have looked at the new .NET 4.5 GetStream
method, but unfortunately, I do not have the ability to pass the stream on - I need the bytes in memory - so I cannot follow other examples of streaming into a file or web response. But I want to try and ensure that only one copy exists in memory at a time!
I have come to the conclusion that this is probably just the way it has to be and that the duplicate copy is just a buffer that has yet to be garbage collected. I really don't want to have to muck about with forcing garbage collection, and am hoping someone has some ideas about alternative approaches.
Upvotes: 3
Views: 6245
Reputation: 21
The problem is that DbDataReader.GetStream()
creates a MemoryStream
and fills this stream with the data of the field.
To avoid this I've created a extension method:
public static class DataReaderExtensions
{
/// <summary>
/// writes the content of the field into a stream
/// </summary>
/// <param name="reader"></param>
/// <param name="ordinal"></param>
/// <param name="stream"></param>
/// <returns>number of written bytes</returns>
public static long WriteToStream(this IDataReader reader, int ordinal, Stream stream)
{
if (stream == null)
throw new ArgumentNullException("stream");
if (reader.IsDBNull(ordinal))
return 0;
long num = 0L;
byte[] array = new byte[8192];
long bytes;
do
{
bytes = reader.GetBytes(ordinal, num, array, 0, array.Length);
stream.Write(array, 0, (int)bytes);
num += bytes;
}
while (bytes > 0L);
return num;
}
/// <summary>
/// writes the content of the field into a stream
/// </summary>
/// <param name="reader"></param>
/// <param name="field"></param>
/// <param name="stream"></param>
/// <returns>number of written bytes</returns>
public static long WriteToStream(this IDataReader reader, string field, Stream stream)
{
int ordinal = reader.GetOrdinal(field);
return WriteToStream(reader, ordinal, stream);
}
}
Upvotes: 2
Reputation: 82186
DocBytes = m_DataReader.GetValue(i) as byte[];
This will create a buffer of size DATA_LENGTH(column_name)
which will then be copied in full to your MemoryStream.
That's bad when DATA_LENGTH(column_name) is a large value.
You'll need to copy it to the memorystream via a buffer.
Also, if your file is THAT large, write it to a temporary file instead of storing it in full in a MemoryStream.
This is how I do it
// http://stackoverflow.com/questions/2885335/clr-sql-assembly-get-the-bytestream
// http://stackoverflow.com/questions/891617/how-to-read-a-image-by-idatareader
// http://stackoverflow.com/questions/4103406/extracting-a-net-assembly-from-sql-server-2005
public static void RetrieveFileStream(System.Data.IDbCommand cmd, string columnName, string path)
{
using (System.Data.IDataReader reader = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess | System.Data.CommandBehavior.CloseConnection))
{
bool hasRows = reader.Read();
if (hasRows)
{
const int BUFFER_SIZE = 1024 * 1024 * 10; // 10 MB
byte[] buffer = new byte[BUFFER_SIZE];
int col = string.IsNullOrEmpty(columnName) ? 0 : reader.GetOrdinal(columnName);
int bytesRead = 0;
int offset = 0;
// Write the byte stream out to disk
using (System.IO.FileStream bytestream = new System.IO.FileStream(path, System.IO.FileMode.Create, System.IO.FileAccess.Write, System.IO.FileShare.None))
{
while ((bytesRead = (int)reader.GetBytes(col, offset, buffer, 0, BUFFER_SIZE)) > 0)
{
bytestream.Write(buffer, 0, bytesRead);
offset += bytesRead;
} // Whend
bytestream.Close();
} // End Using bytestream
} // End if (!hasRows)
reader.Close();
} // End Using reader
} // End Function RetrieveFile
Adopting this code to write to memoryStream is simple.
Maybe you'll need to make the buffer size smaller or larger.
public static System.IO.MemoryStream RetrieveMemoryStream(System.Data.IDbCommand cmd, string columnName, string path)
{
System.IO.MemoryStream ms = new System.IO.MemoryStream();
using (System.Data.IDataReader reader = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess | System.Data.CommandBehavior.CloseConnection))
{
bool hasRows = reader.Read();
if (hasRows)
{
const int BUFFER_SIZE = 1024 * 1024 * 10; // 10 MB
byte[] buffer = new byte[BUFFER_SIZE];
int col = string.IsNullOrEmpty(columnName) ? 0 : reader.GetOrdinal(columnName);
int bytesRead = 0;
int offset = 0;
// Write the byte stream out to disk
while ((bytesRead = (int)reader.GetBytes(col, offset, buffer, 0, BUFFER_SIZE)) > 0)
{
ms.Write(buffer, 0, bytesRead);
offset += bytesRead;
} // Whend
} // End if (!hasRows)
reader.Close();
} // End Using reader
return ms;
} // End Function RetrieveFile
If you need to put it into Response.OutputStream, consider writing it there directly, not via MemoryStream.ToArray() + WriteBytes.
Upvotes: 0
Reputation: 171178
Do you know the length of the data? In that case you can use the streaming approach to copy the data into a perfectly-sized byte[]
. That will get rid of the double-buffering that seems to occur inside ADO.NET in the non-streaming case.
Upvotes: 0
Reputation: 3713
You have a choice when retrieving binary data from SQL. Assuming you're using varbinary (image is depricated) as your data type, you can either return all the data or you can return just some of the data using a simple substring function. If the binary is huge (like 1 gb), returning all of the data will be very memory intensive.
If that's the case, you have the option of taking a more iterative approach to returning the data. Let's say it's a 1 gb binary, you can have the program cycle through the data in 100mb chunks, writing each chunk to disk, then discarding the buffer, before returning for the next 100mb-chunk.
To get the first chunk you'd use:
Declare @ChunkCounter as integer
Declare @Data as varbinary(max)
Declare @ChunkSize as integer = 10000000
Declare @bytes as integer
Select @bytes = datalength(YourField) from YourTable where ID = YourID
If @bytes> @ChunkSize
Begin
/* use substring to get the first chunksize */
Select @data= substring(YourField,0,@ChunkSize), @Chunkcounter +1 as 'ChunkCounter'
FROM YourTable
where ID = YourID
End
Else
Begin ....
Upvotes: 1
Reputation: 1038830
I have looked at the new .NET 4.5 GetStream method, but unfortunately, I do not have the ability to pass the stream on - I need the bytes in memory
So all you have to do is read from this stream into a byte array.
Alternatively you could try reading it in small chunks from the reader using the GetBytes
method as shown here: https://stackoverflow.com/a/625485/29407
Upvotes: 1