Reputation: 5623
End goal: Users are uploading a large number of files in different sizes to my web site. And i dont want duplicate files on the disk.
The solution i have been using is a simple SH1 hash of the file when it is uploaded. With code like this:
public static string HashFile(string FileName)
{
using (FileStream stream = File.OpenRead(FileName))
{
SHA1Managed sha = new SHA1Managed();
byte[] checksum = sha.ComputeHash(stream);
string sendCheckSum = BitConverter.ToString(checksum).Replace("-",string.Empty);
return sendCheckSum;
}
}
This "works" fine for smaller files, but its a big pain when the file is 30gb. So i would like to hash the file as im reciving it from the client. I get the file from the client in "chunks" and size of the chunk is not always static.
Code that recives the file.
int chunk = context.Request["chunk"] != null ? int.Parse(context.Request["chunk"]) : 0;
int chunks = context.Request["chunks"] != null ? int.Parse(context.Request["chunks"]) : 0;
string fileName = context.Request["name"] != null ? context.Request["name"] : string.Empty;
HttpPostedFile fileUpload = context.Request.Files[0];
string fullFilePath = Path.Combine(SiteSettings.UploadTempFolder, fileName);
using (var fs = new FileStream(fullFilePath, chunk == 0 ? FileMode.Create : FileMode.Append))
{
var buffer = new byte[fileUpload.InputStream.Length];
fileUpload.InputStream.Read(buffer, 0, buffer.Length);
fs.Write(buffer, 0, buffer.Length);
**// Here i want the hash, when i have the file data in memory.**
}
Upvotes: 0
Views: 1054
Reputation: 5623
This is a cut and paste from:
Compute a hash from a stream of unknown length in C#
MD5, like other hash functions, does not require two passes.
To start:
HashAlgorithm hasher = ..;
hasher.Initialize();
As each block of data arrives:
byte[] buffer = ..;
int bytesReceived = ..;
hasher.TransformBlock(buffer, 0, bytesReceived, null, 0);
To finish and retrieve the hash:
hasher.TransformFinalBlock(new byte[0], 0, 0);
byte[] hash = hasher.Hash;
This pattern works for any type derived from HashAlgorithm
, including MD5CryptoServiceProvider
and SHA1Managed
.
HashAlgorithm
also defines a method ComputeHash
which takes a Stream
object; however, this method will block the thread until the stream is consumed. Using the TransformBlock
approach allows an "asynchronous hash" that is computed as data arrives without using up a thread.
Upvotes: 0
Reputation: 63772
You can always create your own stream :)
public class ActionStream : Stream
{
private readonly Stream _innerStream;
private readonly Action<byte[], int, int> _readAction;
public ActionStream(Stream innerStream, Action<byte[], int, int> readAction)
{
_innerStream = innerStream;
_readAction = readAction;
}
public override bool CanRead => true;
public override bool CanSeek => false;
public override bool CanWrite => false;
public override long Length => _innerStream.Length;
public override long Position
{
get { return _innerStream.Position; }
set { throw new NotSupportedException(); }
}
public override void Flush() { }
public override int Read(byte[] buffer, int offset, int count)
{
var bytesRead = _innerStream.Read(buffer, offset, count);
_readAction(buffer, offset, bytesRead);
return bytesRead;
}
public override long Seek(long offset, SeekOrigin origin)
{
throw new NotSupportedException();
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
_innerStream.Dispose();
}
base.Dispose(disposing);
}
public override void SetLength(long value) { throw new NotSupportedException(); }
public override void Write(byte[] buffer, int offset, int count)
{
throw new NotSupportedException();
}
}
This allows you to bind together the two stream operations you're doing:
using (var fs = new FileStream(path, chunk == 0 ? FileMode.Create : FileMode.Append))
{
var as = new ActionStream(fileUpload.InputStream,
(buffer, offset, bytesRead) =>
{
fs.Write(buffer, offset, bytesRead);
});
var sha = new SHA1Managed();
var checksum = sha.ComputeHash(as);
}
This assumes that SHA1Manager
reads through every single byte of the input stream in order - you should check that. I'm pretty sure that is how it works, though :)
Upvotes: 2