Atl LED
Atl LED

Reputation: 666

Batch, Parse, and Convert Meta-Data from .1sc Files

TLDR: Questions are after the break.

I am looking to convert and store information from a large (3TB) set of *.1sc images (Bio-Rad, Quantity One). In addition to having the actual image, the file contains a good deal of information regarding where/how the image was taken (meta-data). All of this seams to be held in the Intel Hex format (or at least they all open with "Stable File Version 2.0 Intel Format" in hex).

The ImageJ plugin Bioformats can handle the image, and includes functionality in MetadataTools. To capture just the batch images, I had great success using the batchTiffconvert plugin. The meta-data that seems to be available in ImageJ is incomplete, for this format, but I'm not certain on how to use the MetadataTools (any good guide references would be appreciated, currently going over the API).

My real problem isn't actually parsing the hex to find what I'm looking for. Where I'm failing is actually converting the hex into something meaning full. Example:

.1sc hex example from VS2013

I can parse the hex for scan_area, but I haven't been able to convert 00 10 00 16 00 EC B5 86 00 into something meaningful.

Approaching this from the same direction as a similar DM3 question, I was able to make an XML file, but even if I wrote out the whole XML file, much of the meta-data wasn't included (it had things like the date-stamp, which are good). I think this is because of the information passed to GelReader.Java from BioRadReader.Java. In particular this section:

if (getMetadataOptions().getMetadataLevel() != MetadataLevel.MINIMUM) {
  String units = firstIFD.getIFDStringValue(MD_FILE_UNITS);
  String lab = firstIFD.getIFDStringValue(MD_LAB_NAME);

  addGlobalMeta("Scale factor", scale);
  addGlobalMeta("Lab name", lab);
  addGlobalMeta("Sample info", info);
  addGlobalMeta("Date prepared", prepDate);
  addGlobalMeta("Time prepared", prepTime);
  addGlobalMeta("File units", units);
  addGlobalMeta("Data format",
    fmt == SQUARE_ROOT ? "square root" : "linear");
}

Because the MetadataLevel set in all the Bio-Rad scripts is MetadataLevel.MINIMUM. I tried adding the additional metadata I wanted here, but again it wasn't able to be convert/decoded usefully.


Is it possible to retrieve more of the metadata using this system? If so, am I working in the right section of code? The source for bio-formats is quite large, and I won't even pretend to have a good grasp on it (though I'm trying). Am I just running into a proprietary format problem? Can anyone tell me how to convert the hex values or point more to a resource that explains it?

Upvotes: 1

Views: 513

Answers (1)

ctrueden
ctrueden

Reputation: 6982

First of all: note that neither of the sources you linked above actually correspond to the .1sc file format reader of Bio-Formats. You want the BioRadGelReader.

The Bio-Formats library parses three types of metadata. From the About Bio-Formats page:

There are three types of metadata in Bio-Formats, which we call core metadata, original metadata, and OME metadata.

  1. Core metadata only includes things necessary to understand the basic structure of the pixels: image resolution; number of focal planes, time points, channels, and other dimensional axes; byte order; dimension order; color arrangement (RGB, indexed color or separate channels); and thumbnail resolution.
  2. Original metadata is information specific to a particular file format. These fields are key/value pairs in the original format, with no guarantee of cross-format naming consistency or compatibility. Nomenclature often differs between formats, as each vendor is free to use their own terminology.
  3. OME metadata is information from #1 and #2 converted by Bio-Formats into the OME data model. Performing this conversion is the primary purpose of Bio-Formats. Bio-Formats uses its ability to convert proprietary metadata into OME-XML as part of its integration with the OME and OMERO servers—essentially, they are able to populate their databases in a structured way because Bio-Formats sorts the metadata into the proper places. This conversion is nowhere near complete or bug free, but we are constantly working to improve it. We would greatly appreciate any and all input from users concerning missing or improperly converted metadata fields.

The Bio-Formats command line tools are capable of dumping all original metadata key/value pairs for a given dataset, as well as the converted OME-XML.

In your case, if what you want is quantity over quality, you probably want to record all the original metadata somehow. The showinf command line tool does that automatically (you actually have to pass the -nometa flag to suppress it).

If you look over the complete list of original metadata key/value pairs and the information you seek is still not there, then we'd have to go to the next level and improve the BioRadGelReader to parse more metadata.

Unfortunately, inspecting the source code, it looks like essentially nothing is parsed into the original metadata table for that file format. It was likely reverse engineered, since the Bio-Rad Gel format page says that we do not have a specification document for it.

So what that means is that the Bio-Formats developers are as clueless about the file structure as you are, and would do the same thing you are doing: stare at a hex editor and try to figure things out. Some tricks include:

  • Look up metadata values using the official Bio-Rad software, then search for those values in various encodings using your hex editor.
  • Edit one metadata value (if possible) using the official Bio-Rad software—or by doing multiple acquisitions as similarly as possible except for one variable—then diff the output files to see what effect changing that value had.
  • Check whether the first few hundred bytes matches a known pattern for container formats such as Microsoft OLE-based data, TIFF-based data, or HDF-based data, since many formats reuse these general container structures.

You could also email Bio-Rad to ask whether they are willing to send a spec, and if so, use it to improve the file format reader, and/or forward it on to the Bio-Formats developers.

Upvotes: 1

Related Questions