Shane
Shane

Reputation: 2047

Reading "tables" in C++

I am currently having a look at the TTF Specification and am noticing it is talking quite a lot about "tables". My understanding is that these tables are a representation of a data structure, however through my googling I have found no explanation as to how I might extract this data from my TTF file, how I might figure out how many characters each table occupys, how to determine one table from another, how I might determine each piece of data in each table...

Currently all I know is that I am given a data type of each piece of information (eg. here) which might be able to help me determine each piece of data in a table, but that comes back to the issue of not understanding how to determine the location of the tables itself in the file.

If anyone could explain the theory behind how this works, or show any short but well-commented code snippets to help me understand this better, it would be much appreciated.

Upvotes: 0

Views: 135

Answers (2)

Tony Delroy
Tony Delroy

Reputation: 106126

user3159253's quite right - the very doc you link to provides all the information you need... for example:

Note that the searchRange, the entrySelector and the rangeShift are all multiplied by 16 which represents the size of a directory entry.

Table 4 : The offset subtable

Type    Name    Description
uint32  scaler type     A tag to indicate the OFA scaler to be used to rasterize this font; see the note on the scaler type below for more information.
uint16  numTables   number of tables
uint16  searchRange     (maximum power of 2 <= numTables)*16
uint16  entrySelector   log2(maximum power of 2 <= numTables)
uint16  rangeShift  numTables*16-searchRange

Your related code might look something like:

struct Offset_Subtable;
{
    uint32 scaler_type_;
    uint16 numTables_;
    uint16 searchRange_;
    uint16 entrySelector_;
    uint16 rangeShift_;

    bool is_for_macOS() { return scaler_type_ == 0x74727565; }
    bool is_for_windoes() { return scaler_type_ == 0x00010000; }
    bool is_truetype() { return scaler_type_ == 0x74727565 ||
                         scaler_type_ == 0x00010000; }
    bool is_postscript() { return scaler_type_ == 0x74797031; }
};

Given:

Table 5: The table directory

Type    Name    Description
uint32  tag     4-byte identifier
uint32  checkSum    checksum for this table
uint32  offset  offset from beginning of sfnt
uint32  length  length of this table in byte (actual length not padded length)

You might keep coding:

struct Table_Directory_Entry
{
    uint32 tag_;
    uint32 checkSum_;
    uint32 offset_;
    uint32 length_;
};

const void* p_ttf = ...;
const Offset_Subtable* p_os = static_cast<const Offset_Subtable*>(p_ttf);
... use any data you're interested in...
for (int table_num = 1; table_num <= p_os->numTables_; ++table_num)
{
    const Table_Directory_Entry* p_tde =
        reinterpret_cast<const Table_Directory_Entry*>(&p_os[1]);
    ... use the table directory entry data ...
}

...etc....

Now, some of the raw data is quite confusing ("magic" sentinel values, integers where you'd prefer to see results as values from an enum, numbers that need to be multiplied or divided or offset to yield the value someone might intuitively expect them to hold), so adding little helper functions like is_truetype() is a good start, and you can go a bit further by preventing use of the raw fields by making them private while exposing is_truetype() et al as public member functions.

If you need your structures to have extra data that's not part of the TTF content, then this approach of casting the raw data to your structure to help you parse/interpret it breaks down. Instead, you could use one of this raw-data-only structures for convenient interpretation of the raw memory, supporting a higher-level class that's created on the stack or heap or at global/static scope, that initialises itself given a pointer or reference to the raw-data structure. This could be done in the constructor, assignment operator, a streaming operator>>, or any general member function - whatever you find suits your code. You can then pull out and persist parsed data in standard containers etc..

Upvotes: 0

user3159253
user3159253

Reputation: 17455

The first approach is to carefully study the mentioned Apple documentation, it does contain descriptions of the data structures. Also you can take a look into libfreetype and learn how these data structures are processed in the real-world code.

Upvotes: 1

Related Questions