How is the table RDF_PREFIX used in Virtuoso?

Question

Virtuoso stores RDF triples in the RDF_QUAD table. In this table, IRIs are stored as IRI_ID's datatype and the values are stored in RDF_IRI table. But I don't understand the use of the RDF_PREFIX table. Is it to reduce the used space of RDF_IRI table? But how is the join done? (The RDF_PREFIX table has a integer key). The documentation doesn't explain this.

Joshua Taylor · Accepted Answer

For context (which, admittedly doesn't explain why there are two tables rather than just one) the documentation says:

create table DB.DBA.RDF_PREFIX (
  RP_NAME varchar primary key,
  RP_ID int not null unique );
create table DB.DBA.RDF_IRI (
  RI_NAME varchar primary key,
  RI_ID IRI_ID not null unique );
These two tables store a mapping between internal IRI id's and their external string form. A memory-resident cache contains recently used IRIs to reduce access to this table. Function id_to_iri (in id IRI_ID) returns the IRI by its ID. Function iri_to_id (in iri varchar, in may_create_new_id) returns an IRI_ID for given string; if the string is not used before as an IRI then either NULL is returned or a new ID is allocated, depending on the second argument.

Notice that the RP_ID of RDF_PREFIX is an int, whereas the RI_ID of RDF_IRI is an IRI_ID. Even though they both have varchar primary keys, the IDs to which they map those names are not the same type. In fact, it appears is appears that even though the primary key of RDF_IRI is a varchar, it isn't the same kind of varchar as that in RDF_PREFIX. The following example shows this, I think. Example 1.5.44 How can I perform search for predicate values from the documentation gives an example of its use. I'm not enough of a SQL'er to say exactly what's happening in that example, but it might be a good start to see how RDF_PREFIX gets used. Here's a snippet from that example:

  for ( SELECT RP_NAME, RP_ID 
        FROM RDF_PREFIX
        WHERE (RP_NAME >= path) AND (RP_NAME < path || chr(255)) ) do
    {
      declare fourbytes varchar;
      fourbytes := '----';
      fourbytes[0] := bit_shift (RP_ID, -24);
      fourbytes[1] := bit_and (bit_shift (RP_ID, -16), 255);
      fourbytes[2] := bit_and (bit_shift (RP_ID, -8), 255);
      fourbytes[3] := bit_and (RP_ID, 255);

      for ( SELECT RI_NAME, RI_ID from RDF_IRI
            WHERE (RI_NAME >= fourbytes) AND (RI_NAME < fourbytes || chr(255)) ) do
        {
          if (exists (SELECT TOP 1 1 FROM RDF_QUAD WHERE P=RI_ID))
            result (case when (dump_iri_ids) then RI_ID else RP_NAME || subseq (RI_NAME, 4) end);
        }
    }

Notice that the varchar fourbytes used for retrieving values from RDF_IRI is construct by bitshifting the int that comes the RDF_PREFIX table. I'm not enough of a SQL'er to explain all the details, but it looks to me like the keys to RDF_PREFIX and RDF_IRI are really different kinds of varchars. In RDF_PREFIX, the RP_NAME actually looks like an IRI, but RDF_IRI's RI_NAME is just a sequence of bytes.

How is the table RDF_PREFIX used in Virtuoso?

Answers (1)

Related Questions