Reputation: 1571
I have a winforms application where the user enters some text and tries to match it with a specific line (or block of lines) that starts with that input in a large file (about 5GB).
The lines are sorted alphabetically so I perform a binary search and I identify the specific line in log(n) time without using any memory. For easier navigation in the file all lines have same size (padded with spaces).
using (var file = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
long left = 0;
long right = fileLength / lineLength - 1;
byte[] buffer = new byte[lineLength];
bool found = false;
while (left <= right)
{
var middle = left + (right - left) / 2;
file.Position = middle * lineLength;
int read = file.Read(buffer, 0, lineLength);
string line = Encoding.UTF8.GetString(buffer, 0, read);
if (line.StartsWith(term))
{
found = true;
break;
}
else if (string.Compare(line, term) < 0)
{
left = middle + 1;
}
else
{
right = middle - 1;
}
}
if (found)
{
....
The only 'expensive' operation in the code is the file.Position
jumping between different parts of the file (always at the start of some line) until it finds the specific line. But there cannot be more than about 20 jumps (log2 of total number of lines) per search.
The winforms version is very fast because the file is in the same machine with the executable.
I want to move this tool to azure using an azure function. I would like to continue to use the same FileStream
logic for accessing the file.
I suspect the blob storages are not necessarily in same machine with the azure function so the File.Read
may be some (slow) http call against a different machine and make my search slower by orders of magnitude.
Where should I place the large file so I can search it quickly as soon as the request arrives? Can the file be in the same machine that executes the azure function?
Update
Is it possible to include the file (embedded resource) in my azure function project? What is the size limit for that?
Upvotes: 1
Views: 267
Reputation: 17534
Access (read/write/...) to blob (or other cloud) storage is usually implemented as REST API (not OS/file-system APIs, which is what FileStream.read()
would use). You can emulate a file system by mounting blob/cloud storage using some gimmicks (e.g. FUSE or what silent posted if you're dealing with a "File Share"), but support and performance would be very questionable as it's emulation and calls the REST APIs behind scenes.
Equivalent for a binary search (which needs random access to file) can be implemented using the range options in REST API to read a blob.
All language SDKs are built on top of these REST APIs. May be one of the C# SDK APIs provide a wrapper/param to read a range, so you can use it in your code. If not then you'll have to call REST API.
Note that
Upvotes: 1