Reputation: 503
I would like to use TensorFlow to process XML strings that are proper TFRecords. I'm curious to understand how to structure code that parses each TFRecord. There is a set of input rules and data type mappings that are applied to each TFRecord record to produce an output TFRecord.
Example input TFRecord:
<PLANT><COMMON>Shooting Star</COMMON><BOTANICAL>Dodecatheon</BOTANICAL><ZONE>Annual</ZONE><LIGHT>Mostly Shady</LIGHT><PRICE>$8.60</PRICE><EXTREF><REF1><ID>608</ID><TYPE>LOOKUP</TYPE><REF2><ID>703</ID><TYPE>STD</TYPE></EXTREF><AVAILABILITY>051399</AVAILABILITY></PLANT>
The rules show what needs to be parsed and how it needs to be formatted. E.g. find the COMMON, PRICE, EXTREF>REF2>ID and AVAILABILITY elements and export their values as a TFRecord.
Example output TFRecord:
Shooting Star,8.60,703,51399
How do I add this logic to a graph so when it executes it produces the output TFRecord? My initial thoughts are that I need to translate the mapping logic into a series of tf.ops...
Upvotes: 0
Views: 1317
Reputation: 629
I believe this link will be very helpful to you. It specifies the exact format that the TFRecord
needs, and it provides the code to turn your own dataset into a TFRecord
file.
However, that link did not mention XML
files. It only talked about how to create a tf_example
and turn it into a TFRecord
. This link will actually go a step back and show you how to turn a XML
file into a tf_example
. Note that it will need some modification to fit your needs because it is using the Oxford Pet Dataset.
Upvotes: 1