Reputation: 5046
Suppose I have a file format which is composed of a series of objects, where each object has a header of the following format:
public struct FileObjectHeader {
//The type of the object (not important for this question, but it exists)
public byte TypeID;
//The length of the object's data, which DOES NOT include the size of the header.
public UInt16 Length;
}
followed by data with the specified length.
I read this data by first creating a list of locations for each object and the object's header:
struct FileObjectIndex {
public FileObjectHeader Header;
public long Location;
}
public List<FileObject> ReadObjects(Stream s) {
List<FileObjectReference> objectRefs = new List<FileObjectReference>();
try {
while (true) {
FileObjectHeader header = ReadObjectHeader(s);
//The above advances the stream by the size of the header as well.
FileObjectReference reference = new FileObjectReference() { Header = header, Position = stream.Position };
objectRefs.add(reference);
//Advance the stream to the next object's header.
s.Seek(header.Length, SeekOrigin.Current);
}
} catch (EndOfStreamException) {
//Do nothing as this is an expected case
}
//Now we'd read all of the objects that we've previously located.
//This code isn't too important for the question but I'm including it for reference.
List<FileObject> objects = new List<FileObject>();
foreach (var reference in objectRefs) {
s.seek(reference.Location, SeekOrigin.Begin);
objects.add(ReadObject(reference.Header, s));
}
return objects;
}
A few notes:
ReadObjectHeader
and ReadObject
methods will throw an EndOfStreamException if they fail to read all of the needed data (IE, if they reach the end of the stream). FileStream
, but I cannot gaurentee that either. However, for this case I am mainly worried about FileStreams.My question is this:
Since I am using FileStream.seek
, will using seek cause cases where it goes beyond the end of the stream and expands the file indefinitely? According to the docs:
You can seek to any location beyond the length of the stream. When you seek beyond the length of the file, the file size grows. In Windows NT and later versions, data added to the end of the file is set to zero. In Windows 98 or earlier versions, data added to the end of the file is not set to zero, which means that previously deleted data is visible to the stream.
The way that is stated, it seems like it could expand the file without me extending to, resulting in a ever-growing file as it reads 3 bytes from the header. In practice, it seems like that doesn't happen, but I would like confirmation that it won't happen.
Upvotes: 3
Views: 1803
Reputation: 1069
To answer your question simply, the following code will not make your file grow. It will however throw new EndOfStreamException(). Only writing at a location beyond the end of the file will make your file grow. When the file grows, the data between the current end of file, and the start of your write will be filled with zeros (unless you have enabled the sparse flag, in which case it will be marked as unallocated).
using (var fileStream = new FileStream("f", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))
{
var buffer = new byte[10];
fileStream.Seek(10, SeekOrigin.Begin);
var bytesRead = fileStream.Read(buffer, 0, 10);
if (bytesRead == 0) {
throw new EndOfStreamException();
}
}
Since you are reading/writing binary structured data, I would suggest three things:
Use MemoryMappedFile, and unsafe pointers to access your data (if your app will run on windows only). You can also use a ViewAccessor, but you may find this to be slower than doing the caching yourself due to the extra copies made by interop. If you go the unsafe route, here is code which will quickly fill your structure:
internal static class Native
{
[DllImport("kernel32.dll", EntryPoint = "CopyMemory", SetLastError = false)]
private static unsafe extern void CopyMemory(void *dest, void *src, int count);
private static unsafe byte[] Serialize(TestStruct[] index)
{
var buffer = new byte[Marshal.SizeOf(typeof(TestStruct)) * index.Length];
fixed (void* d = &index[0])
{
fixed (void* s = &buffer[0])
{
CopyMemory(d, s, buffer.Length);
}
}
return buffer;
}
}
Upvotes: 2
Reputation: 106920
The documentation for FileStream.Read()
however says:
Return Value
Type: System.Int32
The total number of bytes read into the buffer. This might be less than the number of bytes requested if that number of bytes are not currently available, or zero if the end of the stream is reached.
Thus I strongly suspect (but you should verify this yourself) that this seeking-beyond-the-end only applies to cases where you write to the file afterwards. This makes sense - you can reserve space if you know that you'll need it, without actually writing anything in it (which would be slow).
When reading however, my guess is that you should get 0
in return and no data would be read. Also, no file expansion.
Upvotes: 3