Reputation: 1610
I am programming in Java. I want to count the size of all files in a specific folder periodically. The period is not constant and is very short. My code is as follows:
//get Index Size
index_byte_size = 0;
File index = new File(indexPath);
String[] files = index.list();
File f = null;
for(int i=0; i < files.length; i++) {
f = new File(index, files[i]);
index_byte_size += f.length();
}
index_byte_size
is what I want to get. indexPath
is the path of the folder.
The code is in a loop. And I output the total index_byte_size
after every loop. The file size should increase continually due to my knowledge. However, the result I get is just like this:
IndexSize(byte) Time(ms)
0 297
0 802
0 1293
0 1710
7769547 2952
7769547 4330
7769547 4431
7769547 4785
7769547 4901
7769547 5213
7769547 5279
7769547 5446
7769547 5660
7769547 5861
7769547 6155
24041054 8763
24041054 9203
24041054 10439
24041054 10820
24041054 11685
36708630 13662
36708630 14309
36708630 16065
36708630 16192
36708630 16374
36708630 16691
36708630 16899
...
As you can see, the file size just increases, and then stays constant, and then increases.... I don't know what is happening and I guess there is something happening with the Operating System. My OS is Windows 7.
[Background]
I want to do a experiment with Lucene to see its indexing capacity, especially its index size and indexing efficiency.
I have a lot of little pieces of text files (each 2-10M size). And I want to see how long it takes Lucene to index each of them one by one, and how big the index will be. So I write this program.
I don't want to be notified when the indices change (because of course they will change). I just what to know how long and how big they are in a very short period.
Does anyone know why? And how can I count the size corrcetly in a real-time manner?
Upvotes: 1
Views: 620
Reputation: 533492
It is common for applications to buffer output and only push out data in lumps.
I suspect this is not the case here. Instead I suspect Lucene is using memory mapped files. When you grow a memory mapped file, it grows with each allocation you make. As an allocation is expensive, but the cost of allocating more than you need rather cheap (as it uses virtual memory and only uses main memory and disk as you touch it) the most efficient thing to do is to allocate large blocks and then fill them up lazily. (E.g. I allocate 128 MB at a time with a 64-bit JVM)
File.length gives you the extents of the file, not how much has actually been used or even how much disk space is used. You can see how much disk space has been used with du
on unix and possibly some tool in Java 7 (I have only found the space used for file system roots, not files)
Even so, this tells you how many pages have been touched. The only way to know accurately how much has been used is to read the file and this has limited accuracy if the file is being modified while you read it.
EDIT: on Windows 7 the space appears to be reserved immediately so you cannot create a sparse file larger than the size of the file system (as you can on ext4 filesystems)
Upvotes: 2