nick_alot
nick_alot

Reputation: 3012

.NET Strings vs. Streams - Memory Profile and Characteristics

I need to pull large Unicode textual strings (e.g. 200Mb) from a Database (nvarchar) and store in memory for processing. i.e. I need random access to all parts of the strings.

Looking at this from strictly memory centric point of view, what are the pro’s and con’s of using a System.IO.MemoryStream versus a System.String as my in memory representation.

Some factors I am trying to research are:

I am looking for clarity and advice on these points, as well as any other memory considerations I have not thought of?

Note: There may be better way of processing these strings, but at this point I am realy just asking about the memory consideration of storing such an object.

Upvotes: 8

Views: 4025

Answers (2)

Marc Gravell
Marc Gravell

Reputation: 1062770

The memory of a string vs a stream is fairly inconsequential. Strings are utf-16, so there may be a small multiple involved, but because of the volumes involved, you would probably be best off writing the data to a scratch file.

To read the data out of the database, use streaming techniques; i.e. use IDataReader (ExecuteReader), with it in sequential-mode, and read chunks of bytes/characters. Don't attempt to read the entire column.

Also, with SQL Server 2008 you miht want to look at the file-stream type.

Examples:

Upvotes: 4

user49572
user49572

Reputation:

Looking at this from strictly memory centric point of view, what are the pro’s and con’s of using a System.IO.MemoryStream versus a System.String as my in memory representation.

Some factors I am trying to research are:

  • How these objects act in a [hypothetical] highly fragmented low memory environment

IMO, a MemoryStream is only useful when the encoding is trivial (e.g. ASCII, ISO-8859-X, etc.). If the encoding is UTF-8 and you have non-ASCII characters then processing will become more difficult. Sure, the MemoryStream will almost certainly consume less memory, but otherwise there's not much of a difference. Under the hood, a MemoryStream uses a byte array, which also needs to be allocated in the contiguous chunck of memory.

  • Actual size in memory (if stream is UTF8, have we nearly halved size)

Right, with purely ASCII chars, a MemoryStream will consume half of what the equivalent string consumes.

  • Is there another object I have not thought about?
List<byte> // has a nicer interface for processing

How are the strings stored in the database? varchar or nvarchar?

Regards,

Andreas

Upvotes: 6

Related Questions