Reputation: 13945
Here's what I'm dealing with...
Some process (out of our control) will occasionally drop a zip file into a directory in Azure File Storage. That directory name is InBound
. So let's say a file called bigbook.zip
is dropped into the InBound
folder.
I need to create an Azure Function App that runs every 5 minutes and looks for zip files in the InBound
directory. If any exists, then one-by-one, we create a new directory by the same name as the zip file in another directory (called InProcess
). So in our example, I would create InProcess/bigbook
.
Now inside InProcess/bigbook
, I need to unzip bigbook.zip
. So by the time the process is done running InProcess/bigbook
will contain all the contents of bigbook.zip
.
Please note: This function I am creating is a Console App that will run as an Azure Function App. So there will be no file system access (at least, as far as I'm aware, anyway.) There is no option to download the zip file, unzip it, and then move the contents.
I am having a devil of a time figuring out how to do this in memory only. No matter what I try, I keep running into an Out Of Memory exception. For now, I am just doing this on my localhost running in debug in Visual Studio 2017, .NET 4.7. In that setting, I am not able to convert the test zip file, which is 515,069KB.
This was my first attempt:
private async Task<MemoryStream> GetMemoryStreamAsync(CloudFile inBoundfile)
{
MemoryStream memstream = new MemoryStream();
await inBoundfile.DownloadToStreamAsync(memstream).ConfigureAwait(false);
return memstream;
}
And this (with high hopes) was my second attempt, thinking that DownloadRangeToStream
would work better than just DownloadToStream
.
private MemoryStream GetMemoryStreamByRange(CloudFile inBoundfile)
{
MemoryStream outPutStream = new MemoryStream();
inBoundfile.FetchAttributes();
int bufferLength = 1 * 1024 * 1024;//1 MB chunk
long blobRemainingLength = inBoundfile.Properties.Length;
long offset = 0;
while (blobRemainingLength > 0)
{
long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
using (var ms = new MemoryStream())
{
inBoundfile.DownloadRangeToStream(ms, offset, chunkLength);
lock (outPutStream)
{
outPutStream.Position = offset;
var bytes = ms.ToArray();
outPutStream.Write(bytes, 0, bytes.Length);
}
}
offset += chunkLength;
blobRemainingLength -= chunkLength;
}
return outPutStream;
}
But either way, I am running into memory issues. I presume it's because the MemoryStream I am trying to create gets too large?
How else can I tackle this? And again, downloading the zip file is not an option, as the app will ultimately be an Azure Function App. I'm also pretty sure that using a FileStream isn't an option either, as that requires a local file path, which I don't have. (I only have a remote Azure URL)
Could I somehow create a temp file in the same Azure Storage account that the zip file is in, and stream the zip file to that temp file instead of to a memory stream? (Thinking out loud.)
The goal is to get the stream into a ZipArchive
using:
ZipArchive archive = new ZipArchive(stream)
And from there I can extract all the contents. But getting to that point w/o memory errors is proving a real bugger.
Any ideas?
Upvotes: 3
Views: 5254
Reputation: 365
Using Azure Storage File Share this is the only way it worked for me without loading the entire ZIP into Memory. I tested with a 3GB ZIP File (with thousands of files or with a big file inside) and Memory/CPU was low and stable. I hope it helps!
var zipFiles = _directory.ListFilesAndDirectories()
.OfType<CloudFile>()
.Where(x => x.Name.ToLower().Contains(".zip"))
.ToList();
foreach (var zipFile in zipFiles)
{
using (var zipArchive = new ZipArchive(zipFile.OpenRead()))
{
foreach (var entry in zipArchive.Entries)
{
if (entry.Length > 0)
{
CloudFile extractedFile = _directory.GetFileReference(entry.Name);
using (var entryStream = entry.Open())
{
byte[] buffer = new byte[16 * 1024];
using (var ms = extractedFile.OpenWrite(entry.Length))
{
int read;
while ((read = entryStream.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
}
}
}
}
}
}
Upvotes: 3
Reputation: 3456
I would suggest you use memory snapshots to see why you are running out of memory within Visual Studio. You can use the tutorial in this article to find the culprit. Doing local development with a smaller file may help you continue to work if your machine is simply running out of memory.
When it comes to doing this within Azure, a node in the Consumption plan is limited to 1.5GB of total memory. If you expect to receive files larger than that then you should look at one of the other App Service plans that give you more memory to work with.
It is possible to store files within the function's local directory, so that is an option. You can't guaruntee that you will be using the same local directory between executions, but this should work as long as you are using the file you downloaded within the same execution.
Upvotes: 0