MoShe
MoShe

Reputation: 6427

looking for way to read and search file fast in c#

I have 100Mb text file and I need to check every line for special word. I am looking for fast way to do it.

so I divide the file to 10 :

public void ParseTheFile(BackgroundWorker bg)
    {

        Lines = File.ReadAllLines(FilePath);
        this.size = Lines.Length;
        chankSise=size/10;

        reports reportInst = new reports(bg,size);

        ParserThread [] ParserthreadArray = new ParserThread[10];

        for (int i = 0; i <ParserthreadArray.Length; i++)
        {
            ParserthreadArray[i] = new ParserThread((reportInst));
            ParserthreadArray[i].Init(SubArray(Lines,i * chankSise, chankSise), OutputPath);

        }

        Thread oThread0 = new Thread(ParserthreadArray[0].run);
        oThread0.IsBackground = true;
        Thread oThread1 = new Thread(ParserthreadArray[1].run);
        oThread1.IsBackground = true;
        Thread oThread2 = new Thread(ParserthreadArray[2].run);
        oThread2.IsBackground = true;
        Thread oThread3 = new Thread(ParserthreadArray[3].run);
        oThread3.IsBackground = true;
        Thread oThread4 = new Thread(ParserthreadArray[4].run);
        oThread4.IsBackground = true;
        Thread oThread5 = new Thread(ParserthreadArray[5].run);
        oThread5.IsBackground = true;
        Thread oThread6 = new Thread(ParserthreadArray[6].run);
        oThread6.IsBackground = true;
        Thread oThread7 = new Thread(ParserthreadArray[7].run);
        oThread7.IsBackground = true;
        Thread oThread8 = new Thread(ParserthreadArray[8].run);
        oThread8.IsBackground = true;
        Thread oThread9 = new Thread(ParserthreadArray[9].run);
        oThread9.IsBackground = true;

        oThread0.Start();
        oThread1.Start();
        oThread2.Start();
        oThread3.Start();
        oThread4.Start();
        oThread5.Start();
        oThread6.Start();
        oThread7.Start();
        oThread8.Start();
        oThread9.Start();

        oThread0.Join();
        oThread1.Join();
        oThread2.Join();
        oThread3.Join();
        oThread4.Join();
        oThread5.Join();
        oThread6.Join();
        oThread7.Join();
        oThread8.Join();
        oThread9.Join();

this is the Init method:

public void Init(string [] olines,string outputPath)
    {
        Lines = olines;
        OutputPath = outputPath+"/"+"ThreadTemp"+threadID;
    }

this is the SubArray method:

public string [] SubArray(string [] data, int index, int length)
    {
        string [] result = new string[length];
        Array.Copy(data, index, result, 0, length);
        return result;
    }

and each thread do this:

 public void run()
    {

        if (!System.IO.Directory.Exists(OutputPath))
        {
            System.IO.Directory.CreateDirectory(OutputPath);
            DirectoryInfo dir = new DirectoryInfo(OutputPath);
            dir.Attributes |= FileAttributes.Hidden;
        }



        this.size = Lines.Length;
        foreach (string line in Lines)
        {



            bgReports.sendreport(allreadychecked);

            allreadychecked++;
            hadHandlerOrEngine = false;
            words = line.Split(' ');
            if (words.Length>4)
            {
                for (int i = 5; i < words.Length; i++)
                {
                    if (words[i] == "Handler" | words[i] == "Engine")
                    {

                        hadHandlerOrEngine = true;
                        string num = words[1 + i];
                        int realnum = int.Parse(num[0].ToString());
                        cuurentEngine = (realnum);
                        if (engineArry[realnum] == false)
                        {
                            File.Create(OutputPath + "/" + realnum + ".txt").Close();
                            engineArry[realnum] = true;

                        }
                        TextWriter tw = new StreamWriter(OutputPath + "/" + realnum + ".txt", true);
                        tw.WriteLine(line);
                        tw.Close();

                        break;
                    }
                }

            }

            if (hadHandlerOrEngine == false)
            {
                if (engineArry[cuurentEngine] == true)
                {
                    TextWriter tw = new StreamWriter(OutputPath + "/" + cuurentEngine + ".txt", true);
                    tw.WriteLine(line);
                    tw.Close();
                }

            }

        }

my question is there any way to make this run faster

Upvotes: 2

Views: 730

Answers (3)

Jon Skeet
Jon Skeet

Reputation: 1500625

You haven't shown your Init method, but at the moment it looks like each of your threads will actually be checking all of the lines. Additionally, it looks like all of those may be trying to write to the same files - and not doing so in an exception-safe way (using using statements) either.

EDIT: Okay, so now we can see Init but we can't see SubArray. Presumably it just copies a chunk of the array.

How slow is this if you avoid using threads to start with? Is it definitely too slow? What is your performance target? It seems unlikely that using 10 threads is going to help though, given that at that point it's entirely memory/CPU-bound. (You should also try to avoid repeating so much code for starting all the threads - why aren't you using a collection for that?)

Upvotes: 7

s_nair
s_nair

Reputation: 812

I would like to recommend something which may be useful. As someone said, there is no point if you assign multiple thread read your file since this is more of I/O activity which in this case get queued up in OS FileManager. But definitely you can place an async I/O request for any available I/O completion thread to look after.

Now when it comes to processing the file, I would recommend you use Memory-mapped files . Memory-mapped files are ideal for scenarios where an arbitrary chunk file ( view) of a considerably larger file needs to be accessed repeatedly/separately. In your scenario, memory-mapped files can help you split/assemble the file if the chunks arrive/process out of order. I have no handy examples at the moment. Have a look at the following article Memory Mapped Files.

Upvotes: 1

Joe
Joe

Reputation: 42627

You are probably IO bound, so I'd guess that multiple threads aren't going to help much. (Odds are your program spends most of its time here: Lines = File.ReadAllLines(FilePath); and not that much time actually parsing. You should measure though.) In fact, your SubArray splitting is possibly slower than if you just passed the whole thing to a single parser thread.

I would be looking at MemoryMappedFile (if this is .NET 4) which should help some with IO by not having to make copies of all the source data.

Upvotes: 6

Related Questions