kennyc
kennyc

Reputation: 5710

Performance implications of using a flatter schema

I'm using FlatBuffers (C++) to store metadata information about a file. This includes EXIF, IPTC, GPS and various other metadata values.

In my current schema, I have a fairly normalized definition whereby each of the groups listed above has its own table. The root table just includes properties for each sub-table.

Basic Example:

table GPSProperties {
  latitude:double;
  longitude:double;
}

table ContactProperties {
  name:string;
  email:string;
}

table EXIFProperties {
  camera:string;
  lens:string;
  gps:GPSProperties;
}

table IPTCProperties {
  city:string;
  country:string;
  contact:ContactProperties;
}

table Registry {
 exifProperties:EXIFProperties;
 iptcProperties:IPTCProperties;
}

root_type Registry;

This works, but the nesting restrictions when building a buffer are starting to make the code pretty messy. As well, breaking up the properties into separate tables is only for clarity in the schema.

I'm considering just "flattening" the entire schema into a single table but I was wondering if there are any performance or memory implications of doing that. This single table could have a few hundred fields, though most would be empty.

Proposal:

table Registry {
  exif_camera:string;
  exif_lens:string;
  exif_gps_latitude:double;
  exif_gps_longitude:double;
  iptc_city:string;
  iptc_country:string;
  iptc_contact_name:string;
  iptc_contact_email:string;
}

root_type Registry;

Since properties that are either not set or set to their default value don't take up any memory, I'm inclined to believe that a flattened schema might not be a problem. But I'm not certain.

(Note that performance is my primary concern, followed closely by memory usage. The normalized schema is performing excellently, but I think a flattened schema would really help me clean up my codebase.)

Upvotes: 0

Views: 595

Answers (3)

UKMonkey
UKMonkey

Reputation: 6983

This single table could have a few hundred fields, though most would be empty.

The performance cost is likely to be so small you won't notice, but your above quote, to me, is the swaying factor about which design to use.

While others are talking about the cost of vtables; I wouldn't worry about that at all. There's a single vtable per class, prepared once per run and will not be expensive. Having 100's of strings that are empty and unused however is going to be very expensive (memory usage wise) and a drain on every object you create; in addition reading your fields will become much more complex since you can no longer assume that all the data for the class as you read it is there.

If most / all the fields were always there, then I can see the attraction of making a single class; but they're not.

Upvotes: 0

Aardappel
Aardappel

Reputation: 6074

Since most of your data is strings, the size and speed of both of these designs will be very similar, so you should probably choose based on what works better for you from a software engineering perspective.

That said, the flat version will likely be slightly more efficient in size (less vtables) and certainly will be faster to access (though again, that is marginal given that it is mostly string data).

The only way in which the flat version could be less efficient is if you were to store a lot of them in one buffer, where which fields are set varies wildly between each table. Then the non-flat version may generate more vtable sharing.

In the non-flat version, tables like GPSProperties could be a struct if the fields are unlikely to ever change, which would be more efficient.

Upvotes: 0

Shivendra Agarwal
Shivendra Agarwal

Reputation: 688

Basics you should be first clear with:

  1. Every table has a vtable at top of it which tells the offset at whihc each field of table could be found. If there are too many fields in a table, this vtable will grow huge, no matter if you store the data or not.

  2. If you try to create a hierarchy of tables, there are extra vtables you are creating and also adding indirection cost to the design.

  3. Also vtables are shared if there is similar data being stored in multiple objects.. Like if you are creating objects with only exif_camera variable being used!

So it depends if your data is going to be huge and heterogeneous use the more organized hierarchy. But if your data is going to be homogeneous prefer a flattened table.

Upvotes: 1

Related Questions