dlouwers
dlouwers

Reputation: 83

Clarifications for FHIR R4 string element

There are a couple of things that I am having trouble with regarding HL7 FHIR R4 strings (https://www.hl7.org/fhir/datatypes.html#string):

  1. The specification mentions: Note that strings SHALL NOT exceed 1MB (1024*1024 characters) in size. The trouble I am having with this is that 1024x1024 Unicode characters are not always 1MB in size. Besides that it is unclear to me what Unicode encoding is meant here, and I will assume the reasonable UTF-8 since that is the default for both XML and JSON. For example the character '🦁' needs 4 bytes to encode, therefore 1024x1024 of such characters would be 4MB in size. The Regex-es in the notes, though not normative, make this a bit clearer, but not much. It states that codes up to FFFF are ok, which means a max. byte use of 3 per characters which would still exceed the 1MB limit by a factor of 3. My interpretation is that we would like a reasonable limit that doesn't open up any denial-of-service attacks. Therefore I would like to suggest keeping the meaningful 1MB limit but drop the number of characters requirement OR add it as a separate requirement.
  2. The specification mentions: Therefore strings SHOULD always contain non-whitespace content. It does not mention what it considers whitespace. Is this just the three codes mentioned earlier representing horizontal tab, carriage return and line feed or are more exotic whitespace characters also prohibited, like next line or no-break space?

Ok, that about sums up my questions about the string specifications. Hope that someone can help me out.

Best,

Dirk

Upvotes: 0

Views: 535

Answers (1)

Lloyd McKenzie
Lloyd McKenzie

Reputation: 6793

  1. The rule is clearly expressed in characters explicitly because Unicode characters have variable length. There is no maximum in bytes, only in characters (though given Unicode rules, you could calculate what the maximum possible length in bytes might be). If you feel this isn't sufficiently clear, feel free to submit a change request.

  2. The expectation is a string SHOULD always have textual content. If you have nothing to say, omit the element. Trying to work around the "no empty string" limitation by transmitting a non-breaking space or some other non-visible character to meet the non-empty requirement while not actually conveying any human-readable information would be contrary to the intent of the specification. We don't demand that systems enforce this because trying to figure out all the creative ways implementers might have of conveying "no useful text" with Unicode isn't terribly practical. I believe the Java code just does a trim() and compares the result to empty string.

Upvotes: 1

Related Questions