Reputation: 3012
I need to pull large Unicode textual strings (e.g. 200Mb) from a Database (nvarchar) and store in memory for processing. i.e. I need random access to all parts of the strings.
Looking at this from strictly memory centric point of view, what are the pro’s and con’s of using a System.IO.MemoryStream versus a System.String as my in memory representation.
Some factors I am trying to research are:
I am looking for clarity and advice on these points, as well as any other memory considerations I have not thought of?
Note: There may be better way of processing these strings, but at this point I am realy just asking about the memory consideration of storing such an object.
Upvotes: 8
Views: 4025
Reputation: 1062770
The memory of a string vs a stream is fairly inconsequential. Strings are utf-16, so there may be a small multiple involved, but because of the volumes involved, you would probably be best off writing the data to a scratch file.
To read the data out of the database, use streaming techniques; i.e. use IDataReader (ExecuteReader), with it in sequential-mode, and read chunks of bytes/characters. Don't attempt to read the entire column.
Also, with SQL Server 2008 you miht want to look at the file-stream type.
Examples:
Upvotes: 4
Reputation:
Looking at this from strictly memory centric point of view, what are the pro’s and con’s of using a System.IO.MemoryStream versus a System.String as my in memory representation.
Some factors I am trying to research are:
- How these objects act in a [hypothetical] highly fragmented low memory environment
IMO, a MemoryStream is only useful when the encoding is trivial (e.g. ASCII, ISO-8859-X, etc.). If the encoding is UTF-8 and you have non-ASCII characters then processing will become more difficult. Sure, the MemoryStream will almost certainly consume less memory, but otherwise there's not much of a difference. Under the hood, a MemoryStream uses a byte array, which also needs to be allocated in the contiguous chunck of memory.
- Actual size in memory (if stream is UTF8, have we nearly halved size)
Right, with purely ASCII chars, a MemoryStream will consume half of what the equivalent string consumes.
- Is there another object I have not thought about?
List<byte> // has a nicer interface for processing
How are the strings stored in the database? varchar or nvarchar?
Regards,
Andreas
Upvotes: 6