Rehim Alizadeh
Rehim Alizadeh

Reputation: 369

How to open .ProcSpec files in python?

I have data obtained from SpectraSuite software (Processed spectrum, OOI Binary File), I tried to open it with

f = open(r'C:\Users\path/file.ProcSpec', 'rb')
    file_content = f.read()
f.close()

but file_content is data like "b'PK\x03\x04\x14\x00\x08\x00\x08\x007t\x...". How can I open this type of file as tabular data?

Upvotes: 0

Views: 447

Answers (1)

user_na
user_na

Reputation: 2273

Method 1

From within Spectra Suite you can open and export them again while setting the file type to be Tab Delimited. Then you should be able to import them with the csv lib.

enter image description here

See how to open csv files here.

From Python directly

Here is a Matlab code which does what you want.

From the Matlab code we see that the OOI formatat is a zipped xml. Also there seem to be some characters which cannot be parsed by a XML parser.

Knowing this, the files can be opened with the following code. Note that this is a quick implementation without error handling or thorough testing.

import os
import shutil
import glob
import xml.etree.ElementTree as ET
import numpy as np

def parseNodeText(node):
        # bool
        nodeText = node.text
        if 'true' == nodeText or 'false' == nodeText:
            return nodeText == 'true'
        # int
        try:
            return int(nodeText)
        except:
            pass
        # float
        try:
            return float(nodeText)
        except:
            pass
        # text
        return nodeText

def extractArrayFromNode(node):
    arr = []
    for val in node:
        arr.append(parseNodeText(val))
    if 'double' in node[0].tag:
        dt = np.float32
    elif 'int' in node[0].tag:
        dt = np.int32
    else:
        dt = np.float32
    return np.array(arr,dtype=dt)
    

def readProcSpecFile(filePath):
    
    dirName = os.path.dirname(filePath)
    
    tmpdir = os.path.join(dirName,'tmp')
    os.makedirs(tmpdir, exist_ok=True)
    tmpFile= os.path.join(tmpdir,'tmp.zip')
    shutil.copy(filePath, tmpFile)
    shutil.unpack_archive(tmpFile, tmpdir)
    
    for f in glob.glob(tmpdir+r'\*.xml'):
        if 'OOISignatures.xml' in f:
            continue
        
        with open(f,"rb") as fi:
            s = fi.read()
       
        badChars = [b'\xa0',b'\x89',b'\x80'] 
        for c in badChars:
            s = s.replace(c,b' ')
            
        with open(f,"wb") as fi:
            fi.write(s)
        
        tree = ET.parse(f.encode("utf-8"))
        root = tree.getroot()
          
        result = [] 
        for spectra in root[0]:
            s = spectra[0]
            spect = {}
            for node in spectra:
                t = node.tag
                n = parseNodeText(node)
                if t == 'pixelValues':
                    spect[t] = extractArrayFromNode(node)
                elif t == 'channelWavelengths':
                    spect[t] = extractArrayFromNode(node)
                elif t == 'acquisitionTime':
                    times =  {}
                    for val in node:
                        times[val.tag] =  parseNodeText(val)
                    spect[t] = times
                    
                elif t == 'certificates':
                    pass
                elif t == 'channelCoefficients':
                    pass
                
                else:
                    spect[t] = n
            result.append(spect)  
        shutil.rmtree(tmpdir)
        return result

Here is an example on how to use it:

iFile = r'your\path\file.ProcSpec'
spects = readProcSpecFile(iFile)
spect = spects[0]  
print(spect.keys())
wavelength = spect['channelWavelengths']
pixelValues = spect['pixelValues']

This is the output:

dict_keys(['saturated', 'integrationTime', 'strobeEnabled', 'strobeDelay', 'pixelValues', 'acquisitionTime', 'boxcarWidth', 'scansToAverage', 'correctForElectricalDark', 'correctForNonLinearity', 'correctForStrayLight', 'rotationEnabled', 'userName', 'channelWavelengths', 'channelNumber', 'channelStabilityScanEnabled', 'channelExternalTriggerEnabled', 'laserWavelength', 'interlock', 'numberOfPixels', 'numberOfDarkPixels', 'spectrometerSerialNumber', 'spectrometerFirmwareVersion', 'spectrometerClass', 'spectrometerPlugins', 'spectrometerNumberOfChannels', 'spectrometerMaximumIntensity', 'spectrometerMinimumIntegrationTime', 'spectrometerMaximumIntegrationTime', 'spectrometerIntegrationTimeStep', 'spectrometerIntegrationTimeBase', 'spectrometerNumberOfPixels', 'spectrometerNumberOfDarkPixels'])

In [2]:wavelength
Out[2]:array([ 199.51251,  199.98462,  200.4567 , ..., 1116.266  , 1116.6813 ,
       1117.0964 ], dtype=float32)

In [3]:pixelValues
Out[3]:array([64000. ,  1958.1,  1958.1, ...,  1957.1,  1957.1,  1957.5],
      dtype=float32)

Here is a plot of the wavelength and values arrays:

enter image description here

Upvotes: 2

Related Questions