bluppfisk
bluppfisk

Reputation: 2652

unpacking a data structure whose first byte indicates length

I am trying to unpack a TLE (Tagged Logical Element) from an IBM AFP format file.

The specification (http://www.afpcinc.org/wp-content/uploads/2017/12/MODCA-Reference-09.pdf) indicates that these are two triplets (even though there are four values) that are structured as follows (with their byte offsets):

0: Tlength | 1: Tid | 2-n: Parameter (= 2: Type + 3: Format + 4-n: EBCDIC encoded String)

Example (with two triplets, one indicating the name and one the value):

0C 02  0B  00   C3 A4 99 99 85 95 83 A8    07 36  00 00    C5 E4 D9
12 KEY UID CHAR  C  u  r  r  e  n  c  y     7 VAL RESERVED  E  U  R

I use Perl to parse it as follows (and successfully):

            if ($key eq 'Data') {
                my $tle = $member->{struct}->{$key};
                my $k_length = hex(unpack('H2', substr($tle, 0, 1)));
                my $key = decode('cp500', substr($tle, 4, $k_length - 4));
                my $v_length = hex(unpack('H2', substr($tle, $k_length, 1)));
                my $value = decode('cp500', substr($tle, $k_length + 4, $v_length - 4));
                print("'$key' => '$value'\n");
            }

Result:

'Currency' => 'EUR'

While the above is successful, I feel that my way is a bit too cpmplicated and that there's a more efficient way to do this. E.g. do pack templates support reading the first n bytes to use as a quantifier for how many successive bytes to unpack? I read the Perl pack tutorial but can't seem to find something along those lines.

Upvotes: 1

Views: 221

Answers (2)

Polar Bear
Polar Bear

Reputation: 6798

Please see if following demo code fulfill your requirements.

This code

defines hash decoder subroutines

reads hex representation of bytes provided by OP from DATA block

converts read data into binary representation $data utilizing pack

extracts length, key/tid, type by utilizing unpack

call decoder subroutine for this particular type

gets back hash consisting two arrays keys and vals

forms new hash %data with provided keys and vals

outputs keys and values (returned keys are used to preserve byte/field order)

NOTE: Encode 'from_to' is utilized to decode EBCDIC -- alternative

use strict;
use warnings;
use feature 'say';

use utf8;
use Encode 'from_to';

my $debug = 1;

my %decoder = ( 
                1 => \&decode_type1,
                2 => \&decode_currency,
                3 => \&decode_type3,
                4 => \&decode_type4,
                5 => \&decode_type5
            );

my $bytes = read_bytes();
my($len,$key,$type) = unpack('C3',$bytes);

my $data = $decoder{$type}($bytes);

my %data;
@data{@{$data->{keys}}} = @{$data->{vals}};

say '
 Unpacked data
---------------';
printf "%-8s => %s\n", $_, $data{$_} for @{$data->{keys}};

sub read_bytes {
    my $hex_bytes = <DATA>;

    chomp $hex_bytes;

    my $bytes = pack('H*',$hex_bytes);

    return $bytes;
}

sub show_bytes {
    my $data = shift;

    print "Bytes: ";
    printf "%02X ", $_ for unpack 'C*', $data;
    print "\n";
}

sub decode_type1 {
    my $bytes = shift;

    return { keys => 'type1', vals => 'vals1' };
}

sub decode_currency {
    my $bytes = shift;

    show_bytes($bytes) if $debug;

    my @keys = qw/length_1 key uid char data_1 length_2 val reserved data_2/;
    my @vals = unpack('C4A8C2SA3',$bytes);

    from_to($vals[4], 'cp37', 'latin1');
    from_to($vals[8], 'cp37', 'latin1');
    
    return { keys => \@keys, vals => \@vals};
}

sub decode_type3 {
    my $bytes = shift;

    return { keys => 'type3', vals => 'vals3' };
}

sub decode_type4 {
    my $bytes = shift;

    return { keys => 'type4', vals => 'vals4' };
}

sub decode_type5 {
    my $bytes = shift;

    return { keys => 'type5', vals => 'vals5' };
}

__DATA__
0C020B00C3A49999859583A807360000C5E4D9

Output

Bytes: 0C 02 0B 00 C3 A4 99 99 85 95 83 A8 07 36 00 00 C5 E4 D9

 Unpacked data
---------------
length_1 => 12
key      => 2
uid      => 11
char     => 0
data_1   => Currency
length_2 => 7
val      => 54
reserved => 0
data_2   => EUR

Note:

It looks suspicious that val occupies one byte only what gives range of 0..255 for an amount in Euro. Perhaps reserved bytes might be a part of the val amount of Euro.

Upvotes: 0

ikegami
ikegami

Reputation: 385546

If the length field didn't include itself, you could do something like the following:

(my $record, $unparsed) = unpack("C/a a*", $unparsed);
my $key = decode("cp500", unpack("x3 a*", $record));

But the length field includes itself.

(my $length, $unparsed) = unpack("C a*", $unparsed);
(my $record, $unparsed) = unpack("a".($length-1)." a*", $unparsed);
my $key = decode("cp500", unpack("x3 a*", $record));

Upvotes: 2

Related Questions