limas
limas

Reputation: 27

Add in ArrayList<Integer> takes too long (more than 50000 nodes)

I have a problem. I want to create a search engine which is based on IR systems. So, I have some files, I take the information I need and I store them in structures such as HashMaps, TreeMaps, ArrayLists e.t.c. Then, I want to write this information in files. So, I open at the same time 2 FileWriters. But I add in them more and more strings.

But this procedure takes too long. I don't know why. When I put everything in the FileWriter, I close it through close().

Do you think that the problem is the reallocation every time I add new strings in my buffers?

Should I follow another strategy of opening the buffer, write, close it, and the next time open again to write at the end of the previous data? This will take less in time?

P.S.: The code is working exactly as i want for a small input file. The problem is when i use large and many input files.

public static void writeWordsandDfInFile(Map<String, Word> tmpMap) throws IOException
{
    Set tmpSet = tmpMap.entrySet();//Transform to Set for quick iteration  and printing
    Iterator tmpIt = tmpSet.iterator();
    String le3h=null;
    int bytesPostingFile;
    int bytesVocabularyFile;
    String str_out = null;
    String prev_str_out = null;
    String str_out2 = null;
    String str_tmp;
    String str_tmp2;
    String Tstrt;
    int prevctr=0;
    int flag=0;
    int i=0;
    int j;
    int k;
    int flag2;
    int flag3;
    int docId;
    //////////////////
    int SIZEDocumentsFileBytes;
    int prevInDocumentsFileBytes = 0;
    int newInDocumentsFileBytes = 0;
    int prwth_kataxwrhsh;
    int ctrPostingFileBytes=0;
    int prwthMonofora=0;



    giveWrdTakeBytePos=new HashMap<String,Integer>();//8a t dinw thn le3h kai 8a mou epistrefei thn 8esh se bytes mesa sto VocabularyFile.txt

    // Create file
    FileWriter fstream = new FileWriter(vocabularyFile.getPath());
    BufferedWriter out = new BufferedWriter(fstream);
    out.
    out.write("Le3h   Df   PosInPostingFile.txt\n\n");
    str_tmp=("Le3h   Df   PosInPostingFile.txt\n\n");

      // Create file
    FileWriter fstream2 = new FileWriter(postingFile.getPath());
    BufferedWriter out2 = new BufferedWriter(fstream2);
    out2.write("DocId  Tf  LineInFile       PosInDocumentsFile\n\n");
    str_tmp2=("DocId  Tf  LineInFile       PosInDocumentsFile\n\n");



    PostingFileBytes=new ArrayList<Integer>();//krataw ta bytes gia kaue eggrafh sto PostingFile



    flag=0;
    i=0;
    while(tmpIt.hasNext())
    {

         Map.Entry m = (Map.Entry) tmpIt.next();
         le3h=(String)m.getKey();

         Set s = tmpMap.get(le3h).getDocList().entrySet();
         Iterator it = s.iterator();
         Map.Entry mm =(Map.Entry)it.next();
         docId=(Integer)mm.getKey();


         Set ss=tmpMap.get(le3h).getDocList().keySet();

         Set stf=tmpMap.get(le3h).getTf().keySet();

         Iterator ssIt = ss.iterator();




         flag2=0;
         prwth_kataxwrhsh=0;
         while(ssIt.hasNext())
         {
            docId=(Integer)ssIt.next();

            out2.write(docId+"  "+tmpMap.get(le3h).getTf(docId));//grafw sto VocabularyFile.txt thn ka8e le3h kai to Df ths
            if(flag2==0)
            {
                str_out2=(docId+"  "+tmpMap.get(le3h).getTf(docId));
                flag2=1;
            }
            else
            {
                str_out2=(docId+"  "+tmpMap.get(le3h).getTf(docId));
            }



            flag3=0;
            Tstrt=null;
            for(k=0;k<tmpMap.get(le3h).ByteList.get(docId).size();k++)
            {
                out2.write("  "+tmpMap.get(le3h).ByteList.get(docId).get(k));

                if(flag3==0)
                {
                    Tstrt=("  "+tmpMap.get(le3h).ByteList.get(docId).get(k));
                    flag3=1;
                }
                else
                {
                    Tstrt=Tstrt+("  "+tmpMap.get(le3h).ByteList.get(docId).get(k));
                }

            }
            str_out2=str_out2+Tstrt;
            out2.write("  ->"+DocumentsFileBytes.get(docId)+"\n");
            str_out2=str_out2+("  ->"+DocumentsFileBytes.get(docId)+"\n");
            bytesPostingFile=str_out2.toString().length();

        ////////////////////////////////////////////////////////////////////////////////////////////////



            //................................................................................................................................
          SIZEDocumentsFileBytes=PostingFileBytes.size();

          if(prwthMonofora==0)
          {
            prevInDocumentsFileBytes=str_tmp2.toString().length();

            prwthMonofora=1;

            PostingFileBytes.add(prevInDocumentsFileBytes);
            ctrPostingFileBytes=0;//dld. parxei kataxwrish sthn 8esh 0 tou posting file
            newInDocumentsFileBytes=prevInDocumentsFileBytes + bytesPostingFile;
            //System.out.println("EPOMENH: "+newInDocumentsFileBytes);
          }
          else
          {
              if(prwth_kataxwrhsh==0)//gia ka8e le3h mono thn prwth fora kai as exei DF>1
              {
                    //System.out.println("Prohg. Timh:"+prevInDocumentsFileBytes);
                    prevInDocumentsFileBytes=newInDocumentsFileBytes;//apo prin
                    //System.out.println("BAZW: "+prevInDocumentsFileBytes);
                    PostingFileBytes.add(prevInDocumentsFileBytes);
                    ctrPostingFileBytes++;
                    prwth_kataxwrhsh=1;
              }
              else
              {
                prevInDocumentsFileBytes=newInDocumentsFileBytes;
              }
              newInDocumentsFileBytes=prevInDocumentsFileBytes + bytesPostingFile;
              //System.out.println("EPOMENH: "+newInDocumentsFileBytes);
          }


         }


         //------------------------------------------------------------------------------------------------------------------


         int ptr=ctrPostingFileBytes;

         out.write(le3h+"  "+tmpMap.get(le3h).getDf());//grafw sto VocabularyFile.txt thn ka8e le3h kai to Df ths

         out.write("  ->"+PostingFileBytes.get(ptr)+"\n");


           if(flag==0)//thn prwth fora
            {
               str_out=(le3h+"  "+tmpMap.get(le3h).getDf()+"  ->"+PostingFileBytes.get(ptr)+"\n");
               giveWrdTakeBytePos.put(le3h, str_tmp.toString().length());
               flag=1;
               prev_str_out=str_tmp+str_out;
            }
            else
            {
                giveWrdTakeBytePos.put(le3h, prev_str_out.toString().length());

                str_out=str_out+(le3h+"  "+tmpMap.get(le3h).getDf()+"  ->"+PostingFileBytes.get(ptr)+"\n");
                prev_str_out=prev_str_out+(le3h+"  "+tmpMap.get(le3h).getDf()+"  ->"+PostingFileBytes.get(ptr)+"\n");
            }

      //................................................................................................................................


    }

    //Close the output stream
    out.close();

    //Close the output stream
    out2.close();

}

Upvotes: 1

Views: 523

Answers (1)

Angelo Fuchs
Angelo Fuchs

Reputation: 9941

From what I can see you never append to a file but always write it new. But from what you wrote above (without having read the whole code) you want to append Data to the file.

new FileWriter("path", true);

Does that help you?

Another suggestion drop the File write and use this:

public static void foo()
{
    // ...

    byte[] fifeMBByteAryOne = new byte[5242880];
    ByteArrayStream bStream = new ByteArrayStream(fifeMBByteAryOne);
    BufferedWriter out = new BufferedWriter(new OutputStreamWriter(bStream));
    byte[] fifeMBByteAryTwo = new byte[5242880];
    ByteArrayStream bStream2 = new ByteArrayStream(fifeMBByteAryTwo);
    BufferedWriter out2 = new BufferedWriter(new OutputStreamWriter(bStream2));

    // ...

}

private static class ByteArrayStream extends OutputStream {
    int index = 0;
    byte[] container;

    public ByteArrayStream(byte[] container) {
        this.container = container;
    }

    @Override
    public void write(int b) throws IOException {
        container[index++] = (byte)b;
    }

}

Then let it run again and see how long it takes. If it is as slow as before, the File is not your problem.


After having read through the code, I'm fairly sure that you are a student or beginner in java programming, that's fine, but you should have stated that in your question. Also it causes people to give you advices rather than direct solutions to your problem.

There are a lot of things you could improve. The first and from my point of view very important: You coding style needs improvement. Really! There are standards on how you write variables (starting with a small letter) methods and so on. Use them. You use far more variables than you need and you define them all at the beginning of the method. You use Sets and Iterators when you don't need them (e.G.

Set s = currentWord.getDocList().entrySet();
Iterator it = s.iterator();
Map.Entry mm = (Map.Entry) it.next();
docId = (Integer) mm.getKey();

then you never use the value of docId, but of course this action here takes time.

Rewrite that method and this time understand what you do and do only what you need, when you need it, the way it is now I would not allow anyone in my company to use it for a customer.

Second: when you post code to the internet be sure to post code that compiles directly. I needed 15 Minutes to have that code compiling. There are very few people around that have that much patience.

Third: For Situations were you write less than ~ 2MB of text its usually useful to use a StringBuilder to construct the whole text and to write it as one thing in the end. That makes debugging easier.

Fourth: Before you post code on the internet be sure to have thought about the problem yourself and have tested to solve it. In this case you could use Dates to do so, just write a text like:

// at the beginning of a loop
long startedAt = new Date().getTime();
// somewhen within the loop:
System.out.println("in situation X " + (new Date().getTime()-startedAt);

That way you can see what step takes how long and can then start to optimize that area.

Fifth: If after Fourth there is still a problem be sure to post a short piece of code that demonstrates clearly your problem. Don't rely on the other users to understand your problem, show it to them. Make it easy for them by using self explaining variable-, method-, classnames in the language you are asking. Same goes for your comments.

Sixth: The reason you should do all this is to give you the ability to solve your problems yourself and to ask people with extended skills only the problems that are worth their time.

Good luck

Upvotes: 3

Related Questions