Reputation: 18747
I was writing a Java
application to read large text files, where data is represented in the format of character columns.. E.g.:
A B R S Y E ...
R E W I W I ...
E Q B U O Y ...
W Q V G O R ...
i.e. single alphabet separated by a space. Each such row has millions of such characters. And each file has several such lines.
My job was to manipulate the file by columns. So I read the file line-by-line, split on ' '
and created array. From such arrays, I created a 2-D array. Everything was fine as I tested it on a small file, with 10 rows. But it started failing when I read files with say 500 rows. My machine and JVM
have lots of memory, so I didn't expect this. Hence I did some profiling and saw that reading the lines into String[]
was taking a LOT more memory than expected. Hence I changed String[]
to char[]
. Memory usage came down dramatically and everything was fine.
My question is why does String[]
takes so much more space than char[]
? Is it because it is like an array of Objects? (since String is also an Object). If someone can explain the low-level details, that would be really great.
Here is what I was doing before:
String[] parts = line.split(" "); // Creating a String[]
This is what I changed it to:
String rowNoSpaces = line.replaceAll(" ", ""); // Removing all the spaces
char[] columns= rowNoSpaces.toCharArray(); // Creating a char[], instead of String[]
Let me know if more info needed.
Upvotes: 2
Views: 1140
Reputation: 887225
Since char
is a primitive type, an array of chars will store those bytes directly in the array with no per-character overhead at all.
By contrast, String
is an object, so the array will store pointers to String
instances elsewhere in the heap, each of which has its own overhead of vtable, length, & other information (including a separate reference to a char[]
with the actual text). Having lots of objects also increases the risk of GC heap fragmentation.
In addition, if you build the strings by concatenation instead of StringBuilder
s, you'll also get lots of extra copies taking up much more memory.
Upvotes: 10