sunil20000
sunil20000

Reputation: 356

How to sort and search in a very huge file that can't be loaded into memory?

In my recent interview I was asked about searching a string in a very huge file full of texts, that can't be loaded into main/RAM memory. For example a file of size 1tb when RAM is 1gb.

I found many articles about this, few even on stackoverflow but not much convincing. For SEARCHING, people are suggesting to use chunk of data or read file line by line something like that. Couldn't find answer about, how to read line by line without loading file and what if search string presents in multiple chunks (some part of the string in one chunk, some part is in another chunk and can be many chunks if search string is quite long), on what basis we decide chunk size. I've many such kind of small question because same was asked by interviewer.

I believe this is not hypothetical question anymore, so anyone who has actually implemented this or have fair idea about this please share your thoughts. Many thanks in advance.

If possible suggest some algorithm or code in C# .Net.

Upvotes: 0

Views: 433

Answers (1)

sommmen
sommmen

Reputation: 7628

Your question is still a bit vague on what you're actually looking for, however hope this helps.

For large files you could use memorymappedfile :

https://learn.microsoft.com/en-us/dotnet/api/system.io.memorymappedfiles.memorymappedfile?view=netframework-4.8

Sample from msdn:

using System;
using System.IO;
using System.IO.MemoryMappedFiles;
using System.Runtime.InteropServices;

class Program
{
    static void Main(string[] args)
    {
        long offset = 0x10000000; // 256 megabytes
        long length = 0x20000000; // 512 megabytes

        // Create the memory-mapped file.
        using (var mmf = MemoryMappedFile.CreateFromFile(@"c:\ExtremelyLargeImage.data", FileMode.Open,"ImgA"))
        {
            // Create a random access view, from the 256th megabyte (the offset)
            // to the 768th megabyte (the offset plus length).
            using (var accessor = mmf.CreateViewAccessor(offset, length))
            {
                int colorSize = Marshal.SizeOf(typeof(MyColor));
                MyColor color;

                // Make changes to the view.
                for (long i = 0; i < length; i += colorSize)
                {
                    accessor.Read(i, out color);
                    color.Brighten(10);
                    accessor.Write(i, ref color);
                }
            }
        }
    }
}

public struct MyColor
{
    public short Red;
    public short Green;
    public short Blue;
    public short Alpha;

    // Make the view brighter.
    public void Brighten(short value)
    {
        Red = (short)Math.Min(short.MaxValue, (int)Red + value);
        Green = (short)Math.Min(short.MaxValue, (int)Green + value);
        Blue = (short)Math.Min(short.MaxValue, (int)Blue + value);
        Alpha = (short)Math.Min(short.MaxValue, (int)Alpha + value);
    }
}

Upvotes: 2

Related Questions