Krister Valtonen
Krister Valtonen

Reputation: 101

HL7 encoding characters in non-ASCII strings

I have a question of how to handle HL7v2 encoding characters appearing when using a non-standard (non 7 bit ASCII) character sets. As an example, this is a part of a HL7v2 message:

MSH|^~\&|appl|fac|||20240314081500||ORM^O01|10089|P|2.3||||||ISO IR87
PID|||Japan_Test_1||Yamada^Tarou~<esc>$B;3ED<esc>(B^<esc>$BB@O:<esc>(B~<esc>$B$d$^$@<esc>(B^<esc>$B$?$m$&<esc>(B|...

where "<esc>" denotes the presence of byte 0x1B (the ESC character). The message uses the "ISO IR87" character set (JIS X 0208-1990). The family name in the second repeat of the patient name contains the JISX0208 encoding of the Hiragana letter "ま" (ma) which is the bytes 0x24 0x5E, which happen to correspond to the ASCII characters $ and ^.

The question is, since the byte 0x5E appears here, does the HL7 standard require me to escape it? Ie must I use "\S\" here instead? On one hand one can argue that 0x5E, ASCII encoding of ^, appears and hence need to be escaped. On the other hand, the caret character (^) does not appear, 0x5E is only a part of the encoding of the character "ま" (ma).

Put in other words do I need to resolve HL7 escaping first, or do I need to take care of the character encoding first? I have tried to search the HL7 standard without finding a definitive answer.

Upvotes: 1

Views: 324

Answers (2)

Nick Radov
Nick Radov

Reputation: 431

You raise a legitimate question, but I think the HL7 V2 Messaging Standard is somewhat ambiguous on this point. You might want to create an HL7 Jira issue for the Infrastructure and Messaging (InM) work group to clarify this point in the next version. But as a practical matter, many real-world implementations are unlikely to properly support the ISO IR87 character set regardless of what you put in MSH-18. I recommend testing both approaches with your trading partners to see what actually works in practice.

Upvotes: 0

Grahame Grieve
Grahame Grieve

Reputation: 3586

The spec is silent on this because it hadn't really occurred to us that this is an issue. HL7 messages are sequences of characters not bytes, and you resolve the encoding first before resolving the escaping of the characters

having said that... there will be parsers out there that don't understand JISX0208 encoding and fall over on this because they see if as an unescaped separator character, so you'd have to check each and every trading party.

Upvotes: 1

Related Questions