Reputation: 75
I am doing some image processing in python, and need to crop multiple areas within many thousands of images. The pixel coordinate data used to crop the ROI (region of interest) is in an Excel spreadsheet, and arranged as THREE comma seperated values within ONE column. As you can see from this example data, there are multiple ROIs within each image that require cropping.
The three pixel coordinate values in this column are displayed as [x,y,r], with the "x/y" coord values marking the top left hand corner of the square shaped ROI, and the "r" value representing the length of each of the four sides as seen here . Clearly, the way to find the ROI without multiple x/y values for each corner of the box is: "ROI = im[Y:Y+R, X:X+R]", however im struggling to get to this stage.
I have used the pandas.read_excel function to read in the spreadsheet, however im struggling to get any further? Can anyone help please?
Thanks, Rhod
Upvotes: 1
Views: 544
Reputation: 207660
You can do it like this:
#!/usr/bin/env python3
import re
import cv2
import numpy as np
import pandas as pd
# Open spreadsheet
excel_file = 'spreadsheet.xlsx'
ss = pd.read_excel(excel_file)
# Extract filenames and coordinates
FandC = []
for index,row in ss.head().iterrows():
filename = row['filename']
coords = row['Pixel coords']
# Use regex to find anything that looks like a bunch of digits possibly with decimal point
x, y, r = re.findall(r'[0-9.]+',coords)
print(f'DEBUG: filename={filename}, x={x}, y={y}, r={r}')
FandC.append({'filename': filename, 'x':x, 'y':y, 'r':r})
You now have a list of filenames and coordinates in FandC
that looks like this:
DEBUG: filename=M116_13331848_13109013315679.jpg, x=1345.83, y=1738, r=44.26
DEBUG: filename=M116_13331848_13109013315679.jpg, x=776.33, y=698.17, r=65.72
DEBUG: filename=M116_13331848_13109013315679.jpg, x=1215.5, y=485.67, r=61.16
DEBUG: filename=M116_13331848_13109013315679.jpg, x=1439.33, y=502.67, r=64.73
DEBUG: filename=M116_13331848_13109013315679.jpg, x=793.33, y=1661.5, r=86.03
Upvotes: 1