EKS
EKS

Reputation: 5623

Hash a file as its being recived

End goal: Users are uploading a large number of files in different sizes to my web site. And i dont want duplicate files on the disk.

The solution i have been using is a simple SH1 hash of the file when it is uploaded. With code like this:

public static string HashFile(string FileName)
{
   using (FileStream stream = File.OpenRead(FileName))
   {
      SHA1Managed sha = new SHA1Managed();
      byte[] checksum = sha.ComputeHash(stream);

      string sendCheckSum = BitConverter.ToString(checksum).Replace("-",string.Empty);
                    return sendCheckSum;
   } 
}

This "works" fine for smaller files, but its a big pain when the file is 30gb. So i would like to hash the file as im reciving it from the client. I get the file from the client in "chunks" and size of the chunk is not always static.

Code that recives the file.

int chunk = context.Request["chunk"] != null ? int.Parse(context.Request["chunk"]) : 0;
int chunks = context.Request["chunks"] != null ? int.Parse(context.Request["chunks"]) : 0;
string fileName = context.Request["name"] != null ? context.Request["name"] : string.Empty;

HttpPostedFile fileUpload = context.Request.Files[0];    
string fullFilePath = Path.Combine(SiteSettings.UploadTempFolder, fileName);
using (var fs = new FileStream(fullFilePath, chunk == 0 ? FileMode.Create : FileMode.Append))
{
    var buffer = new byte[fileUpload.InputStream.Length];
    fileUpload.InputStream.Read(buffer, 0, buffer.Length);

    fs.Write(buffer, 0, buffer.Length);
    **// Here i want the hash, when i have the file data in memory.**
}

Upvotes: 0

Views: 1054

Answers (2)

EKS
EKS

Reputation: 5623

This is a cut and paste from:

Compute a hash from a stream of unknown length in C#

MD5, like other hash functions, does not require two passes.

To start:

HashAlgorithm hasher = ..;
hasher.Initialize();

As each block of data arrives:

byte[] buffer = ..;
int bytesReceived = ..;
hasher.TransformBlock(buffer, 0, bytesReceived, null, 0);

To finish and retrieve the hash:

hasher.TransformFinalBlock(new byte[0], 0, 0);
byte[] hash = hasher.Hash;

This pattern works for any type derived from HashAlgorithm, including MD5CryptoServiceProvider and SHA1Managed.

HashAlgorithm also defines a method ComputeHash which takes a Stream object; however, this method will block the thread until the stream is consumed. Using the TransformBlock approach allows an "asynchronous hash" that is computed as data arrives without using up a thread.

Upvotes: 0

Luaan
Luaan

Reputation: 63772

You can always create your own stream :)

public class ActionStream : Stream
{
    private readonly Stream _innerStream;
    private readonly Action<byte[], int, int> _readAction;

    public ActionStream(Stream innerStream, Action<byte[], int, int> readAction)
    {
        _innerStream = innerStream;
        _readAction = readAction;
    }

    public override bool CanRead => true;
    public override bool CanSeek => false;
    public override bool CanWrite => false;
    public override long Length => _innerStream.Length;
    public override long Position
    {
        get { return _innerStream.Position; }
        set { throw new NotSupportedException(); }
    }

    public override void Flush() { }

    public override int Read(byte[] buffer, int offset, int count)
    {
        var bytesRead = _innerStream.Read(buffer, offset, count);

        _readAction(buffer, offset, bytesRead);

        return bytesRead;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        throw new NotSupportedException();
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            _innerStream.Dispose();
        }

        base.Dispose(disposing);
    }

    public override void SetLength(long value) { throw new NotSupportedException(); }

    public override void Write(byte[] buffer, int offset, int count) 
    { 
      throw new NotSupportedException(); 
    }
}

This allows you to bind together the two stream operations you're doing:

using (var fs = new FileStream(path, chunk == 0 ? FileMode.Create : FileMode.Append))
{
  var as = new ActionStream(fileUpload.InputStream,
    (buffer, offset, bytesRead) =>
    {
      fs.Write(buffer, offset, bytesRead);
    });

  var sha = new SHA1Managed();
  var checksum = sha.ComputeHash(as);
}

This assumes that SHA1Manager reads through every single byte of the input stream in order - you should check that. I'm pretty sure that is how it works, though :)

Upvotes: 2

Related Questions