sify
sify

Reputation: 741

proper way to get char*/size of varchar field from composite-type arguments in postgres c function

I have a c-function which receives a row. The docs only displays a simple situation where integer is involved. I use DatumGetVarCharPP to get a VarChar*, because the source says DatumGetVarCharP is Obsolescent.

I debugged and found that macros VARDATA and VARSIZE which are usually used didn't return the right data. During debug I found that VarChar has a header size of 1 byte, so VARHDRSZ shouldn't be used either.

HeapTupleHeader t = PG_GETARG_HEAPTUPLEHEADER(0);
bool isnull;
Datum type = GetAttributeByName(t, "type", &isnull);
VarChar *type_text = DatumGetVarCharPP(type);
char *typestr = VARDATA(type_text);
int typestr_len = VARSIZE(type_text) - VARHDRSZ;

After research I found VARDATA_ANY and VARSIZE_ANY returns the correct pointer and size, and the header is 1 byte so real-size=size-1.

But I don't know whether it's the idiomatic way. In source I read that VARDATA_ANY don't apply to external or compressed-in-line Datum, I don't know how to check a Datum is external or compressed-in-line. Also the 1-byte offset is based on my observation during debug.

So what's the proper macros to use instead of VARDATA, VARSIZE and VARHDRSIZE? Or should I use VARATT_IS_* macros to decide the correct header size and use corresponding VARDATA_*/VARSIZE_* macros?

Upvotes: 0

Views: 456

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246578

If you don't need the Datum aligned, use VARDATA_ANY and VARSIZE_ANY. See postgres.h:

/*
 * In consumers oblivious to data alignment, call PG_DETOAST_DATUM_PACKED(),
 * VARDATA_ANY(), VARSIZE_ANY() and VARSIZE_ANY_EXHDR().  Elsewhere, call
 * PG_DETOAST_DATUM(), VARDATA() and VARSIZE().  Directly fetching an int16,
 * int32 or wider field in the struct representing the datum layout requires
 * aligned data.  memcpy() is alignment-oblivious, as are most operations on
 * datatypes, such as text, whose layout struct contains only char fields.
 *
 * Code assembling a new datum should call VARDATA() and SET_VARSIZE().
 * (Datums begin life untoasted.)
 *
 * Other macros here should usually be used only by tuple assembly/disassembly
 * code and code that specifically wants to work with still-toasted Datums.
 */
[...]
#define VARSIZE_ANY(PTR) \
    (VARATT_IS_1B_E(PTR) ? VARSIZE_EXTERNAL(PTR) : \
     (VARATT_IS_1B(PTR) ? VARSIZE_1B(PTR) : \
      VARSIZE_4B(PTR)))

/* Size of a varlena data, excluding header */
#define VARSIZE_ANY_EXHDR(PTR) \
    (VARATT_IS_1B_E(PTR) ? VARSIZE_EXTERNAL(PTR)-VARHDRSZ_EXTERNAL : \
     (VARATT_IS_1B(PTR) ? VARSIZE_1B(PTR)-VARHDRSZ_SHORT : \
      VARSIZE_4B(PTR)-VARHDRSZ))

/* caution: this will not work on an external or compressed-in-line Datum */
/* caution: this will return a possibly unaligned pointer */
#define VARDATA_ANY(PTR) \
     (VARATT_IS_1B(PTR) ? VARDATA_1B(PTR) : VARDATA_4B(PTR))

Some background: when PostgreSQL is about to persist a table row (tuple), and the size exceeds 2000 bytes, it invokes the TOAST mechanism that first compresses some attributes and then, if that is not enough, stores them out of line. Such attributes are not immediately “deTOASTed” upon retrieval. You only do that if you need the actual values.

So if you are dealing with data that come from a table, you always have to expect TOASTed values and use the appropriate macros to deTOAST them.

Upvotes: 1

Related Questions