Chris Grimm
Chris Grimm

Reputation: 771

Java File IO vs Local database

I am working on a project that involves parsing through a LARGE amount of data rapidly. Currently this data is on disk and broken down into a directory hierarchy:

(Folder: DataSource) -> (Files: Day1, Day2, Day3...Day1000...)
(Folder: DataSource2) -> (Files: Day1, Day2, Day3...Day1000...) 
...
(Folder: DataSource1000) -> ...
...

Each Day file consists of entries that need to be accessed very quickly.

My initial plans were to use traditional FileIO in java to access these files, but upon further reading, I began to fear that this might be too slow.

In short, what is the fastest way I can selectively load entries from my filesystem from varying DataSources and Days?

Upvotes: 3

Views: 2399

Answers (3)

Beryllium
Beryllium

Reputation: 12998

what is the fastest way I can selectively load entries from my filesystem from varying DataSources and Days?

selectively means filtering, so my answer is a localhost database. Generally speaking if you filter, sort, paginate or extract distinct records from a large number of records, it's hard to beat a localhost SQL server. You get a query optimizer (nobody does that Java), a cache (which requires effort in Java, especially the invalidation), database indexes (have not seen that being done in Java either) etc. It's possible to implement these things manually, but then your are writing a database in Java.

On top of this you gain access to higher level SQL functions like window aggegrates etc., so in most cases there is no need to post-process data in Java.

Upvotes: 0

Naveen Babu
Naveen Babu

Reputation: 1584

The issue could be solved both ways but it depends on few factors

go for FileIO.

  1. if the volume is < millons of rows
  2. if your dont do a complicated query like Jon Skeet said
  3. if your referance for fetching the row is by using hte Folder Name: "DataSource" as the key

go for DB

  1. if you see your program reading through millions of records
  2. you can do complicated selection, even multiple rows using a single select.
  3. if you have knowledge of creating a basic table structure for DB

Upvotes: 6

mel3kings
mel3kings

Reputation: 9405

Depending on architecture you are using you can implement different ways of caching, in the Jboss there is a built-in Jboss Caching, there are also third party opensource software that lets utilizes caching, like Redis, or EhCache depending on your needs. Basically Caching stores objects in their memory, some are passivated/activated upon demand, when memory is exhausted it is stored as a physical IO file, which are also easily activated marshalled by the caching mechanism. It lowers the database connectivity held by your program. There are other caches but here are some of them that I've worked with:

Upvotes: 2

Related Questions