Yuhao
Yuhao

Reputation: 1610

Java: How to Count File Size Correctly in a Real-time manner

I am programming in Java. I want to count the size of all files in a specific folder periodically. The period is not constant and is very short. My code is as follows:

//get Index Size
index_byte_size = 0;
File index = new File(indexPath);
String[] files = index.list();
File f = null;
for(int i=0; i < files.length; i++) {
    f = new File(index, files[i]);
    index_byte_size += f.length();
}

index_byte_size is what I want to get. indexPath is the path of the folder.

The code is in a loop. And I output the total index_byte_size after every loop. The file size should increase continually due to my knowledge. However, the result I get is just like this:

IndexSize(byte) Time(ms)
0   297
0   802
0   1293
0   1710
7769547 2952
7769547 4330
7769547 4431
7769547 4785
7769547 4901
7769547 5213
7769547 5279
7769547 5446
7769547 5660
7769547 5861
7769547 6155
24041054    8763
24041054    9203
24041054    10439
24041054    10820
24041054    11685
36708630    13662
36708630    14309
36708630    16065
36708630    16192
36708630    16374
36708630    16691
36708630    16899
...

As you can see, the file size just increases, and then stays constant, and then increases.... I don't know what is happening and I guess there is something happening with the Operating System. My OS is Windows 7.


[Background]

I want to do a experiment with Lucene to see its indexing capacity, especially its index size and indexing efficiency.

I have a lot of little pieces of text files (each 2-10M size). And I want to see how long it takes Lucene to index each of them one by one, and how big the index will be. So I write this program.

I don't want to be notified when the indices change (because of course they will change). I just what to know how long and how big they are in a very short period.


Does anyone know why? And how can I count the size corrcetly in a real-time manner?

Upvotes: 1

Views: 620

Answers (1)

Peter Lawrey
Peter Lawrey

Reputation: 533492

It is common for applications to buffer output and only push out data in lumps.

I suspect this is not the case here. Instead I suspect Lucene is using memory mapped files. When you grow a memory mapped file, it grows with each allocation you make. As an allocation is expensive, but the cost of allocating more than you need rather cheap (as it uses virtual memory and only uses main memory and disk as you touch it) the most efficient thing to do is to allocate large blocks and then fill them up lazily. (E.g. I allocate 128 MB at a time with a 64-bit JVM)

File.length gives you the extents of the file, not how much has actually been used or even how much disk space is used. You can see how much disk space has been used with du on unix and possibly some tool in Java 7 (I have only found the space used for file system roots, not files)

Even so, this tells you how many pages have been touched. The only way to know accurately how much has been used is to read the file and this has limited accuracy if the file is being modified while you read it.

EDIT: on Windows 7 the space appears to be reserved immediately so you cannot create a sparse file larger than the size of the file system (as you can on ext4 filesystems)

Upvotes: 2

Related Questions