Rishabh
Rishabh

Reputation: 43

Breaking out of loop in DFDL

I am trying to convert a FLAT file to XML using DFDL. It has following format: Each element is 5 byte.All are in same line but i am separating them to avoid confusion. I will address element by first letter in them.

0AAAA  
81AAA  
eeeee  
qqqqq    
82BBB    
rrrrr  
sssss  
9QQQQ  

Now 0 and 9 are grandparents we don't have to worry about them. 8 is parent and second byte of 81AAA(that is 1) will determine the format of its children. There can be many 8 and many children of a 8 parent(but all of them will have same format).
I tried one schema but once it go into children(eeeee) its not coming out of it and every record is being printed in children format only.

Upvotes: 0

Views: 432

Answers (1)

stevedlawrence
stevedlawrence

Reputation: 480

Below is a schema that I think describes your data, tested on Daffodil 2.2.0:

<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">

  <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />

  <xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format ref="GeneralFormat" />
    </xs:appinfo>
  </xs:annotation>

  <xs:element name="Root">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="GrandParent" maxOccurs="unbounded">
          <xs:complexType>
            <xs:choice dfdl:initiatedContent="yes">
              <xs:element name="Zero" dfdl:initiator="0">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="Value" type="xs:string" dfdl:length="4" dfdl:lengthKind="explicit" />
                    <xs:element ref="Eight" minOccurs="0" maxOccurs="unbounded" />
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              <xs:element name="Nine" dfdl:initiator="9">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="Value" type="xs:string" dfdl:length="4" dfdl:lengthKind="explicit" />
                    <xs:element ref="Eight" minOccurs="0" maxOccurs="unbounded" />
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
            </xs:choice>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="Eight" dfdl:initiator="8">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="ChildrenFormat" type="xs:string" dfdl:length="1" dfdl:lengthKind="explicit" />
        <xs:element name="Value" type="xs:string" dfdl:length="3" dfdl:lengthKind="explicit" />
        <xs:choice dfdl:choiceDispatchKey="{ ./ChildrenFormat }">
          <xs:element ref="One" maxOccurs="unbounded" dfdl:choiceBranchKey="1" />
          <xs:element ref="Two" maxOccurs="unbounded" dfdl:choiceBranchKey="2" />
        </xs:choice>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="One" type="xs:string" dfdl:length="5" dfdl:lengthKind="explicit">
    <xs:annotation>
      <xs:appinfo source="http://www.ogf.org/dfdl/">
        <dfdl:discriminator test="{ fn:not(fn:starts-with(., '8') or fn:starts-with(., '9')) }" />
      </xs:appinfo>
    </xs:annotation>
  </xs:element>

  <xs:element name="Two" type="xs:string" dfdl:length="5" dfdl:lengthKind="explicit">
    <xs:annotation>
      <xs:appinfo source="http://www.ogf.org/dfdl/">
        <dfdl:discriminator test="{ fn:not(fn:starts-with(., '8') or fn:starts-with(., '9')) }" />
      </xs:appinfo>
    </xs:annotation>
  </xs:element>

</xs:schema>

A description of how this works:

  • The Root of the data is an unbounded number of GrandParent elements
  • Each GrandParent element contains either a Zero or a Nine, based on the initiator. The initiator consumes the first of the 5 bytes of the grandparent data
  • The Zero/Nine elements contain a Value which consumes the remaining 4 bytes of the gradparent data
  • Following the Value is zero or more Eight elements
  • Each Eight element has an initiator of "8", consuming the first of 5 bytes
  • Each Eight element has a ChildrenFormat, consuming the second of 5 bytes
  • Each Eight element has a Value, consuming the last 3 of 5 bytes
  • Each Eight element has an unbounded number of either all One or all Two elements
  • A choiceDispatchKey/Branch is used to determine whether to parse all One or all Two elements, dispatching off of the ChildrenFormat element
  • Each One or Two element consumes 5 bytes
  • In order to determine when the unbounded number of One or Two elements ends, a discriminator is placed on the One/Two elements. This discriminator fails when the data parsed as a One/Two does not start with an '8' or a '9'.
  • Also, all fields are treated as strings for simplicity

With this, your example data parses to an infoset like so:

<Root>
  <GrandParent>
    <Zero>
      <Value>AAAA</Value>
      <Eight>
        <ChildrenFormat>1</ChildrenFormat>
        <Value>AAA</Value>
        <One>eeeee</One>
        <One>qqqqq</One>
      </Eight>
      <Eight>
        <ChildrenFormat>2</ChildrenFormat>
        <Value>BBB</Value>
        <Two>rrrrr</Two>
        <Two>sssss</Two>
      </Eight>
    </Zero>
  </GrandParent>
  <GrandParent>
    <Nine>
      <Value>QQQQ</Value>
    </Nine>
  </GrandParent>
</Root>

Upvotes: 0

Related Questions