Reputation: 6122
I have a 6gb file and the last 20 lines are bad. I would like to use a memory-mapped file with .NET 4 to read the last few lines and display them in console.writelines, and later go the last 20 lines and replace them with String.Empty. What is a cool way to do that using a memory-mapped file/stream with a C# example?
Thanks.
Upvotes: 0
Views: 4618
Reputation: 669
First of all I will write the code in F#, but it should be possible to translate into C# code since my C# coding is rusty.
Second as I understand it, you need to make an effecient way to access the content of some file and alter it, then write it back.
To use a memorymappedfile you will need to first read it all into a temporary mappedfile tmp. This will only course a little overheat because you will do it all in one read. Then you use tmp to alter the content, and first after it is done you write the new file content back. This will properly be faster than using a normal filestream and you should not very about stack/heap overflow.
open System.IO
open Sytem.IO.MemoryMappedFiles
// Create a memorymapped image of the file content i.e. copy content
// return the memorymappedfile
// use is the same as using in C#
let createMappedImage path =
let mmf = MemorymappedFile.create("tmp", (fileInfo(path)).Length)
use writer = new StreamWriter(mmf.CreaViewStream())
writer.write(File.ReadAllText(path))
mmf // return memorymappedfile to be used
// Some manipulation function to apply to the image
// type : char[] -> StreamReader -> unit
let fillBuffer (buffer : byte[]) (reader : StreamReader) =
let mutable entry = 0
let mutable ret = reader.Read() // return -1 as EOF
while ret >= 0 && entry < buffer.Length do
buffer.[entry] <- ret
entry <- entry + 1
entry // return count of byte read
// type : int -> byte[] -> StreamWriter -> unit
let flushBuffer count (buffer : byte[]) (writer : StreamWriter) =
let stop = count + 1
let mutable entry = 0
while entry < stop do
writer.Write(buffer.[entry])
entry <- entry + 1
// return unit e.i. void
// read then write the buffer one time
// writeThrough call fillBuffer which return the count of byte read,
// and input it to the flushBuffer that then write it to the destination.
let writeThrough buffer source dest =
flushBuffer (fillBuffer buffer source) buffer dest
// return unit
// write back the altered content of the image without overflow
let writeBackMappedImage bufsize dest image =
// buffer for read/write
let buf = Array.Create bsize (byte 0)// normal page is 4096 byte
// delete old content on write
use writer = new StreamWriter(File.Open(dest,FileMode.Truncate,FileAccess.Write))
use reader = new StreamReader(image.CreateViewStream())
while not reader.EndOfStream do
writeThrough buf reader writer
let image = createMappedImage "some path"
let alteredImage = alteration image // some undefined function to correct the content.
writeBackMappedImage image
image.dispose()
image.close()
This hasn't been run so there is likely to be some errors, but the idea is clear i think. as said above the createMappedImage create an memory mapped image file of the file.
The fillbuffer takes a byte array and a streamreader, then fill it and return The flushBuffer takes a count of how much of the buffer should be flushed, a source streamreader and a destination streamwriter.
Anything you will need to do to the file you can do to the image, without doing something unintentionally and dangerous to the file. when you are sure that the transformation are correct you can then write the image content back.
Upvotes: 0
Reputation: 74227
I don't know anything about ReverseStreamReaders. The solution is [essentially] simple:
The devil is in the details, though, regarding that "read lines in reverse part". There are some complicating factors that are likely to get you in trouble:
I'm not sure there's a good, easy solution outside of the obvious: read sequentially through the file and don't write the last twenty lines.
Upvotes: 0
Reputation: 1105
There are two parts to the solution. For the first part, you need to read the memory map backwards to grab lines, until you have read the number of lines you want (20 in this case).
For the second part, you want to truncate the file by the last twenty lines (by setting them to string.Empty). I'm not sure if you can do this with a memory map. You may have to make a copy of the file somewhere and overwrite the original with the source data except the last xxx bytes (which represents the last twenty lines)
The code below will extract the last twenty lines and display it.
You'll also get the position (lastBytePos variable) where the last twenty lines begin. You can use that information to know where to truncate the file.
UPDATE: To truncate the file call FileStream.SetLength(lastBytePos)
I wasn't sure what you meant by the last 20 lines are bad. In case the disk is physically corrupt and the data cannot be read, I've added a badPositions list that holds the positions where the memorymap had problems reading the data.
I don't have a +2GB file to test with, but it should work (fingers crossed).
using System;
using System.Collections.Generic;
using System.Text;
using System.IO.MemoryMappedFiles;
using System.IO;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
string filename = "textfile1.txt";
long fileLen = new FileInfo(filename).Length;
List<long> badPositions = new List<long>();
List<byte> currentLine = new List<byte>();
List<string> lines = new List<string>();
bool lastReadByteWasLF = false;
int linesToRead = 20;
int linesRead = 0;
long lastBytePos = fileLen;
MemoryMappedFile mapFile = MemoryMappedFile.CreateFromFile(filename, FileMode.Open);
using (mapFile)
{
var view = mapFile.CreateViewAccessor();
for (long i = fileLen - 1; i >= 0; i--) //iterate backwards
{
try
{
byte b = view.ReadByte(i);
lastBytePos = i;
switch (b)
{
case 13: //CR
if (lastReadByteWasLF)
{
{
//A line has been read
var bArray = currentLine.ToArray();
if (bArray.LongLength > 1)
{
//Add line string to lines collection
lines.Insert(0, Encoding.UTF8.GetString(bArray, 1, bArray.Length - 1));
//Clear current line list
currentLine.Clear();
//Add CRLF to currentLine -- comment this out if you don't want CRLFs in lines
currentLine.Add(13);
currentLine.Add(10);
linesRead++;
}
}
}
lastReadByteWasLF = false;
break;
case 10: //LF
lastReadByteWasLF = true;
currentLine.Insert(0, b);
break;
default:
lastReadByteWasLF = false;
currentLine.Insert(0, b);
break;
}
if (linesToRead == linesRead)
{
break;
}
}
catch
{
lastReadByteWasLF = false;
currentLine.Insert(0, (byte) '?');
badPositions.Insert(0, i);
}
}
}
if (linesToRead > linesRead)
{
//Read last line
{
var bArray = currentLine.ToArray();
if (bArray.LongLength > 1)
{
//Add line string to lines collection
lines.Insert(0, Encoding.UTF8.GetString(bArray));
linesRead++;
}
}
}
//Print results
lines.ForEach( o => Console.WriteLine(o));
Console.ReadKey();
}
}
}
Upvotes: 0
Reputation: 138841
Memory Mapped Files can be a problem for big files (typically files that are of a size equivalent or bigger than the RAM), in case you eventually map the whole file. If you map only the end, that should not be a real issue.
Anyway, here is a C# implementation that does not use Memory Mapped File, but a regular FileStream. It is based on a ReverseStreamReader
implementation (code also included). I would be curious to see it compared to other MMF solutions in terms of performance and memory consumption.
public static void OverwriteEndLines(string filePath, int linesToStrip)
{
if (filePath == null)
throw new ArgumentNullException("filePath");
if (linesToStrip <= 0)
return;
using (FileStream file = new FileStream(filePath, FileMode.Open, FileAccess.ReadWrite))
{
using (ReverseStreamReader reader = new ReverseStreamReader(file))
{
int count = 0;
do
{
string line = reader.ReadLine();
if (line == null) // end of file
break;
count++;
if (count == linesToStrip)
{
// write CR LF
for (int i = 0; i < linesToStrip; i++)
{
file.WriteByte((byte)'\r');
file.WriteByte((byte)'\n');
}
// truncate file to current stream position
file.SetLength(file.Position);
break;
}
}
while (true);
}
}
}
// NOTE: we have not implemented all ReadXXX methods
public class ReverseStreamReader : StreamReader
{
private bool _returnEmptyLine;
public ReverseStreamReader(Stream stream)
: base(stream)
{
BaseStream.Seek(0, SeekOrigin.End);
}
public override int Read()
{
if (BaseStream.Position == 0)
return -1;
BaseStream.Seek(-1, SeekOrigin.Current);
int i = BaseStream.ReadByte();
BaseStream.Seek(-1, SeekOrigin.Current);
return i;
}
public override string ReadLine()
{
if (BaseStream.Position == 0)
{
if (_returnEmptyLine)
{
_returnEmptyLine = false;
return string.Empty;
}
return null;
}
int read;
StringBuilder sb = new StringBuilder();
while((read = Read()) >= 0)
{
if (read == '\n')
{
read = Read();
// supports windows & unix format
if ((read > 0) && (read != '\r'))
{
BaseStream.Position++;
}
else if (BaseStream.Position == 0)
{
// handle the special empty first line case
_returnEmptyLine = true;
}
break;
}
sb.Append((char)read);
}
// reverse string. Note this is optional if we don't really need string content
if (sb.Length > 1)
{
char[] array = new char[sb.Length];
sb.CopyTo(0, array, 0, array.Length);
Array.Reverse(array);
return new string(array);
}
return sb.ToString();
}
}
Upvotes: 3
Reputation: 631
From the question it sounds like you need to have a Memory Mapped file. However, there is a way to do this without using a memory mapped file.
Open the file normally, then move the file pointer to the end of the file. Once you are at the end, read the file in reverse (decrement the file pointer after each read) until you get the desired number of characters.
The cool way...load the characters into an array in reverse as well then you do not have to reverse them once you are done reading.
Do the fix to the array then write them back. Close, Flush, Complete!
Upvotes: 1