Reputation: 621
I have a dataset of 1,00,000+ .IMG files that I need to convert to .PNG / .JPG format to apply CNN for a simple classification task.
I referred to this answer and the solution works for me partially. What I mean is that some images are not properly converted. The reason for that, according to my understanding is that some images have a Pixel Depth of 16 while some have 8.
for file in fileList:
rawData = open(file, 'rb').read()
size = re.search("(LINES = \d\d\d\d)|(LINES = \d\d\d)", str(rawData))
pixelDepth = re.search("(SAMPLE_BITS = \d\d)|(SAMPLE_BITS = \d)", str(rawData))
size = (str(size)[-6:-2])
pixelDepth = (str(pixelDepth)[-4:-2])
print(int(size))
print(int(pixelDepth))
imgSize = (int(size), int(size))
img = Image.frombytes('L', imgSize, rawData)
img.save(str(file)+'.jpg')
Data Source: NASA Messenger Mission
.IMG files and their corresponding converted .JPG Files
Please let me know if there's any more information that I should provide.
Upvotes: 1
Views: 775
Reputation: 207345
Hopefully, from my other answer, here, you now have a better understanding of how your files are formatted. So, the code should look something like this:
#!/usr/bin/env python3
import sys
import re
import numpy as np
from PIL import Image
import cv2
rawData = open('EW0220137564B.IMG', 'rb').read()
# File size in bytes
fs = len(rawData)
bitDepth = int(re.search("SAMPLE_BITS\s+=\s+(\d+)",str(rawData)).group(1))
bytespp = int(bitDepth/8)
height = int(re.search("LINES\s+=\s+(\d+)",str(rawData)).group(1))
width = int(re.search("LINE_SAMPLES\s+=\s+(\d+)",str(rawData)).group(1))
print(bitDepth,height,width)
# Offset from start of file to image data - assumes image at tail end of file
offset = fs - (width*height*bytespp)
# Check bitDepth
if bitDepth == 8:
na = np.frombuffer(rawData, offset=offset, dtype=np.uint8).reshape(height,width)
elif bitDepth == 16:
dt = np.dtype(np.uint16)
dt = dt.newbyteorder('>')
na = np.frombuffer(rawData, offset=offset, dtype=dt).reshape(height,width).astype(np.uint8)
else:
print(f'ERROR: Unexpected bit depth: {bitDepth}',file=sys.stderr)
# Save either with PIL
Image.fromarray(na).save('result.jpg')
# Or with OpenCV may be faster
cv2.imwrite('result.jpg', na)
If you have thousands to do, I would recommend GNU Parallel which you can easily install on your Mac with homebrew using:
brew install parallel
You can then change my program above to accept a filename as parameter in-place of the hard-coded filename and the command to get them all done in parallel is:
parallel --dry-run script.py {} ::: *.IMG
For a bit more effort, you can get it done even faster by putting the code above in a function and calling the function for each file specified as a parameter. That way you can avoid starting a new Python interpreter per image and tell GNU Parallel to pass as many files as possible to each invocation of your script like this:
parallel -X --dry-run script.py ::: *.IMG
The structure of the script then looks like this:
def processOne(filename):
open, read, search, extract, save as per my code above
# Main - process all filenames received as parameters
for filename in sys.argv[1:]:
processOne(filename)
Upvotes: 3