Jakoss
Jakoss

Reputation: 5255

Huge String Table in Java

I've got a question about storing huge amount of Strings in application memory. I need to load from file and store about 5 millions lines, each of them max 255 chars (urls), but mostly ~50. From time to time i'll need to search one of them. Is it possible to do this app runnable on ~1GB of RAM?

Will

ArrayList <String> list = new ArrayList<String>();

work?

As far as I know String in java is coded in UTF-8, what gives me huge memory use. Is it possible to make such array with String coded in ANSI?

This is console application run with parameters:

java -Xmx1024M -Xms1024M -jar "PServer.jar" nogui

Upvotes: 3

Views: 2457

Answers (3)

Mr.Eddart
Mr.Eddart

Reputation: 10273

A Java String is a full blown object. This means that appart from the characters of the string theirselves, there is other information to store in it (a pointer to the class of the object, a counter with the number of pointers pointing to it, and some other infrastructure data). So an empty String already takes 45 bytes in memory (as you can see here). Now you just have to add the maximum lenght of your string and make some easy calculations to get the maximum memory of that list.

Anyway, I would suggest you to load the string as byte[] if you have memory issues. That way you can control the encoding and you can still do searchs.

Upvotes: 2

Miserable Variable
Miserable Variable

Reputation: 28761

Is there some reason you need to restrict it to 1G? If you want to search through them, you definitely don't want to swap to disk, but if the machine has more memory it makes sense to go higher then 1G.

If you have to search, use a SortedSet, not an ArrayList

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533780

The latest JVMs support -XX:+UseCompressedStrings by default which stores strings which only use ASCII as a byte[] internally.

Having several GB of text in a List isn't a problem, but it can take a while to load from disk (many seconds)

If the average URL is 50 chars which are ASCII, with 32 bytes of overhead per String, 5 M entries could use about 400 MB which isn't much for a modern PC or server.

Upvotes: 10

Related Questions