Reputation: 22963
Here's the case: I am creating a batch script that runs daily, parsing logfiles and exporting the data to a database. The format of this file is basically
std_prop1;std_prop2;std_prop3;[opt_prop1;[opt_prop2;[opt_prop3;[..]]]
The standard properties map to a table with a column for each property, where each line in the logfile basically maps to a corresponding row. It might look like LOGDATA(id,timestamp,systemId,methodName,callLenght)
. Since we should be able to log as many optional properties as we like, we cannot map them to the same table, since that would mean adding a row the table every time a new property was introduced. Not to think of the number of NULL
references ...
So the additional properties go in another table, say EXTRA_PROPS(logdata_foreign_key,propname,value)
. In reality, most of the optional properties are the same (e.g. os version, app container, etc), making it somewhat wasteful to log for instance 4 rows in EXTRA_PROPS for each row in LOGDATA (in the case that one on average had 4 extra properties). So what I would like my batch job to do is
for each additionalProperty in logRow:
see if additionalProperty already exist
if exists:
create a reference to it in a reference table
if not:
add the property to the extra properties table
create a reference to it in a reference table
I would then probably have three slightly different tables:
LOGDATA(id,timestamp,systemId,methodName,callLenght)
EXTRA_PROPS(id,propname,value)
LOGDATA_HAS_EXTRA_PROPS(logid,extra_prop_id)
I am not 100% this is a better way of doing it, I would still create N rows in the LOGDATA_HAS_EXTRA_PROPS
table for N properties, but at least I would not add any new rows to EXTRA_PROPS
.
Even if this might not be the best way (what is?), I am still wondering about the tecnhical side: How would I implement this using Hibernate? It does not have to be superfast, but it would need to chew through 100K+ rows.
Upvotes: 0
Views: 146
Reputation: 24732
Firstly, I would not recommend using Hibernate for this type of logic. Hibernate is a great product but doing this kind of high load data operations may not be it's strongest point.
From data modeling standpoint, it appears to me that (propname,value)
is actually a primary key in EXTRA_PROPS
. Basically, you want to express the logic that, for example, hostname
+ foo.bar.com
combination will only appear once in the table. Am I right? That would be PK. So you will need to use that in LOGDATA_HAS_EXTRA_PROPS
. Using name
alone will not be sufficient for reference.
In Hibernate (if you choose to use it), that can be expressed via composite key using @EmbeddedId
or Embeddable
on object mapped to EXTRA_PROPS
. And then you can have many to many relationship that uses LOGDATA_HAS_EXTRA_PROPS
as association table.
Upvotes: 1