Extracting a curve and identifying coordinates from an image using OpenCV

Currently I am particularly interested in pacing strategy optimization in different sports. As part of these kind of processes, I have to define certain courses concerning distance and changes in altitude along the course. I thought I could make this process easier by extracting the data I need from downloaded schematic course profiles (.png) using OpenCV, instead of other lot more manual and time consuming solutions.

After reviewing some of the previous questions at SO, I aimed for identifying the curve representing the course profile of a specific course iodentify some of its points using a predetermined step size (pixels) and saving their coordinates to.csv. However, since I am a very beginner in OpenCV, I believe there must be some better approaches, unsurprisingly, my attempt wasn't well-executed.

Something I also find problematic in my approach is that identification of the curve is very color dependent, thereby if an image with different graphics is used, I need to rewrite parts of the code. Is there a way to avoid this?

Here you can find an example image, representing a course profile I used for testing my code Course profile

My code looks like as follows:

`import cv2
import numpy as np
import matplotlib.pyplot as plt


image_path = r"image"
image = cv2.imread(image_path)


hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

color_blue1 = np.array([100, 50, 50])
color_blue2 = np.array([140, 255, 255])

mask = cv2.inRange(hsv_image, color_blue1, color_blue2)


blue_curve = cv2.bitwise_and(image, image, mask=mask)

# Converting the result to grayscale
gray_curve = cv2.cvtColor(blue_curve, cv2.COLOR_BGR2GRAY)

edges = cv2.Canny(gray_curve, 50, 150)

contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

curve_contour = max(contours, key=cv2.contourArea)

curve_points = curve_contour.squeeze()

curve_points = curve_points[np.argsort(curve_points[:, 0])]


def sample_curve(points, step):
    x_values = points[:, 0]
    sampled_points = []
    
    for x in range(x_values.min(), x_values.max(), step):

        interval_points = points[(x_values >= x) & (x_values < x + step)]
        if interval_points.size:

            avg_y = interval_points[:, 1].mean()
            sampled_points.append([x, avg_y])
    
    return np.array(sampled_points)

# Set step frequency (pixels)
step_frequency = 5
sampled_curve = sample_curve(curve_points, step_frequency)


plt.figure(figsize=(15, 7))
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.plot(curve_points[:, 0], curve_points[:, 1], 'r-', label='Detected curve')
plt.scatter(sampled_curve[:, 0], sampled_curve[:, 1], color='blue', label='Sampled points')
plt.legend()
plt.show()


np.savetxt('sampled_curve.csv', sampled_curve, delimiter=',', header='distance,altitude', comments='')`

which gave the following result: Result

EDIT: Here is a course profile relatively clean of markings, for testing methods: Profile 2

Upvotes: 0

Views: 658

Answers (2)

After fine-tuning the code provided to scrape the following image: enter image description here

by parsing the edge of the green area, the following plot can be generated:2

Although this can be considered to be a good result in itself, the smoothness of the curve can be further enhanced using a robust Savitzky-Golay filter. On the other hand, when exporting the results of the original parsing script, the export might not contain a sufficient number of data pairs, for recreating the curve. Therfore, I applied cubic spline interpolation on the data after exporting, and then redrawing the curve.

    import pandas as pd
import numpy as np
from scipy.interpolate import CubicSpline
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter

def interpolate_data(excel_file, output_file):
    df = pd.read_excel(excel_file)
    
    if not all(col in df.columns for col in ['Distance (m)', 'Altitude (m)']):
        raise ValueError("Excel file must contain 'Distance (m)' and 'Altitude (m)' columns")

    distances = df['Distance (m)'].values
    altitudes = df['Altitude (m)'].values
    
    sorted_indices = np.argsort(distances)
    distances = distances[sorted_indices]
    altitudes = altitudes[sorted_indices]
    
    min_distance = int(np.floor(distances.min()))
    max_distance = int(np.ceil(distances.max()))
    interpolated_distances = np.arange(min_distance, max_distance + 1)
    
    cs = CubicSpline(distances, altitudes, bc_type='natural')
    interpolated_altitudes = cs(interpolated_distances)
    
    interpolated_df = pd.DataFrame({
        'Distance (m)': interpolated_distances,
        'Altitude (m)': interpolated_altitudes
    })
    
    interpolated_df.to_excel(output_file, index=False)
    print(f"Interpolated data saved to {output_file}")

interpolate_data(r".xlsx", 'interpolated_data_tour.xlsx')

data_tour = pd.read_excel(r".xlsx")

altitude_tour = data_tour['Altitude (m)'].values
distance_tour = data_tour['Distance (m)'].values

# Savitzky-Golay filter
window_length = 801 #must be an uneven number
polyorder = 3 
smoothed_altitude_tour = savgol_filter(altitude_tour, window_length, polyorder)

# Plot the curve
fig, ax2 = plt.subplots(figsize=(10, 6))
ax2.plot(distance_tour, smoothed_altitude_tour, marker='', linewidth=3, color='blue', alpha=0.8)
ax2.set_ylim(-40, 700)
yticks = ax2.get_yticks()[1:-1]  # Remove the first and last tick
ax2.set_yticks(yticks)
ax2.set_yticklabels([''] + [f'{int(tick)}' for tick in yticks[1:]])
ax2.set_xlim(0, max(distance_tour))
xticks = [0, 5000, 10000, 15000, 20000, 25000, 30000, 33520]  
xticklabels = ['0', '5', '10', '15', '20', '25', '30', '33.8']  
ax2.set_xticks(xticks)
ax2.set_xticklabels(xticklabels)
ax2.set_ylabel('Altitude [m]')
ax2.set_xlabel('Distance [km]')
ax2.set_title('Final plot')
ax2.grid(False)
plt.show()

Thus the final output can look like this:enter image description here

Upvotes: 0

Barış Aktaş
Barış Aktaş

Reputation: 426

This question has been like a fun puzzle for me, i have long answer for you. I tried my best to make the code simple and commented but still its long and complex. First i will explain the pipeline:

    1. Enter your graphs x-axis limits and y-axis limits and select the exact region including the whole graph.
    1. Crop the selected region and turn it into gray image.
    1. Get profile lines along the width of the gray image.
    1. For each profile line find falling edges and rising edges through the profile points.A point from falling edge here will have a negative difference with the previous point and the opposite is true for rising edges.
    1. From the falling and rising edges find all the sinks in the profile data. _A sink here is a falling edge followed by a rising edge.Now one of theese sinks is coming from the curve as a result of the dark pixels of the curve on the white background.
    1. Then from the sinks find the curve point with this algorithm:

      6.1. If there is only one sink then its center is the curve point

      6.2. If there are multiple sinks detected then the one closest to previous point is the curve point

      6.3. If there is no sink detected no curve point on this profile

As curve's color is dark(low gray value) and the background is white (high gray value) the drop between the pixel value remains mostly the same (around 180 for your graph). Teherefore by catching every gray level drop on that range you can reconstruct the complete curve.

In the below image a profile line is drawn along the blue line and its point plot is shown. As you can see from the plot there is a big drop which corresponds to a curve point, and it is detected by the above method and it is shown as a red circle on the profile line.

Example Profile

Here all the profiles and the detected points are shown:

All Profiles Drawn

Here is the curve regenerated and plotted again: Re-Constructed Curve

Here is the complete code, there are many parameters for fine tuning for different cases.

import numpy
import skimage
import cv2
from matplotlib import pyplot

#Function to take difference between two consecutive points
now = 0
def differentiator(variable):
    global now
    before = now
    now = variable
    return now-before

#Function to find rising edges from profile data
def findRisingEdges(profile, min_height=1, max_height=255, min_width=1, max_width=255,smoothing=10):
    profile_diff = [differentiator(p) for p in profile]
    if smoothing:
        profile_diff = [pd if abs(pd)>smoothing else 0 for pd in profile_diff]
    rising_edges = []
    
    i=0
    while i < len(profile_diff): # do not check the last point
        edge_start,edge_end = None,None
        diff = profile_diff[i]

        #Search positive derivative for rising edge
        if diff > 0:
            edge_start = i-1 if i>1 else i

            #Find where the edge ends by searching non-positive derivative
            if i == (len(profile_diff)-1):# if its the last point then it is edge end
                edge_end = i
                i = i + 1
            else:
                while profile_diff[i+1]>0:
                    i = i + 1
                    if i == (len(profile_diff)-1):
                        break
                edge_end = i
                i = i + 1

            edge_width = edge_end - edge_start
            edge_height = profile[edge_end] - profile[edge_start]

            if edge_width >= min_width and edge_width <= max_width and edge_height >= min_height and edge_height <= max_height:
                rising_edges.append([edge_start,edge_end,edge_width,edge_height])
        else:
            i = i + 1

    return rising_edges

#Function to find falling edges from profile data
def findFallingEdges(profile, min_height=1, max_height=255, min_width=1, max_width=255,smoothing=10):
    profile_diff = [differentiator(p) for p in profile]
    if smoothing:
        profile_diff = [pd if abs(pd)>smoothing else 0 for pd in profile_diff]

    falling_edges = []
    i=0
    while i < len(profile_diff): # do not check the last point
        edge_start,edge_end = None,None
        diff = profile_diff[i]
        #Search negative derivative for rising edge
        if diff < 0:
            edge_start = i-1 if i>1 else i

            #Find where the edge ends by searching non-negative derivative
            if i == (len(profile_diff)-1):# if its the last point then it is edge end
                edge_end = i
                i = i + 1
            else:
                while profile_diff[i+1]<0:
                    i = i + 1
                    if i == (len(profile_diff)-1):
                        break
                edge_end = i
                i = i + 1
            
            edge_width = edge_end - edge_start
            edge_height = abs(profile[edge_end] - profile[edge_start])

            if edge_width >= min_width and edge_width <= max_width and edge_height >= min_height and edge_height <= max_height:
                falling_edges.append([edge_start,edge_end,edge_width,edge_height])
        else:
            i = i + 1

    return falling_edges

#Function to find sinks on the profile by looking for falling edges followed by rising edges
def findSinks(profile, min_width=3, min_depth=50, smoothing=10,
                min_fe_height=1, max_fe_height=255, min_fe_width=1, max_fe_width=255, 
                min_re_height=1, max_re_height=255, min_re_width=1, max_re_width=255):
    
    rEdges = findRisingEdges(profile, min_height=min_fe_height, max_height=max_fe_height, min_width=min_fe_width, max_width=max_fe_width,smoothing=smoothing)
    fEdges = findFallingEdges(profile, min_height=min_re_height, max_height=max_re_height, min_width=min_re_width, max_width=max_re_width,smoothing=smoothing)

    sinks = []
    for rising_edge in rEdges:
        rising_edge_start,rising_edge_end,reWidth,reHeight = rising_edge 
        falling_edge_starts = [f[0] for f in fEdges]

        #Find all the sinks
        for y in reversed(range(0,rising_edge_end)):

            if y in falling_edge_starts:
                falling_edge_start,falling_edge_end,feWidth,feHeight = fEdges[falling_edge_starts.index(y)]
                
                sinkwidth = rising_edge_end-falling_edge_start+1
                depth = (feHeight+reHeight)//2
                if sinkwidth>=min_width and depth>=min_depth:
                    sinks.append([falling_edge_start,rising_edge_end,depth])
                break
    return sinks

#Function to find the curve from the image 
def extractCurve(src_image, profile_interval=5):
    #Read the image
    h,w,c = src_image.shape
    gray = cv2.cvtColor(src_image,cv2.COLOR_BGR2GRAY)

    #Scan the image through it's width(x-axis) get profile line extract curve's points
    last_y_level = 0 #this will always contain the latest curve point
    curve_points = []
    for xi in range(w//profile_interval):
        #Draw the profile on image
        cv2.line(graph,(xi*profile_interval,0),(xi*profile_interval,h),(255,0,0),1)

        #Get the profile
        profile = skimage.measure.profile_line(gray, (0,xi*profile_interval), (h,xi*profile_interval), linewidth=1, mode='constant') #Take the profile line

        #Find all the sinks in the profile data one of thesee points belongs to the curve
        sinks = findSinks(profile,smoothing=10,min_fe_height=150,min_re_height=150)

        if len(sinks)==0:   #If no sink no curve point
            pass
        
        elif len(sinks)==1: #If 1 sink it is the curve point
            start,end,depth = sinks[0]
            pX,pY = xi*profile_interval,(start+end)//2
            cv2.circle(src_image,(pX,pY),4,(0,0,255),-1)
            curve_points.append([pX,pY])
            last_y_level = pY
        
        else:   #If multiple sinks choose the one closest to last curve point
            closest = sinks[0]
            start,end,depth = closest
            min_y_dist = abs((start+end)//2 - last_y_level)
            sinks.pop(0)
            for sink in sinks:
                start,end,depth = sink
                y_dist = abs((start+end)//2 - last_y_level)
                if y_dist < min_y_dist:
                    min_y_dist = y_dist
                    closest = sink

            start,end,depth = closest
            pX,pY = xi*profile_interval,(start+end)//2
            cv2.circle(src_image,(pX,pY),4,(0,0,255),-1)
            curve_points.append([pX,pY])
            last_y_level = pY

        cv2.imshow('graph',graph)
        cv2.waitKey(1)
        # pyplot.plot(profile,'o-')
        # pyplot.show()

    return curve_points

#Read the image
image = cv2.imread('graph.png')

#Choose the region of interest including excat boundries the graph
rx,ry,rw,rh = cv2.selectROI('Select The Complete and Exact Boundaries of Graph',image)
graph = image[ry:ry+rh,rx:rx+rw]
cv2.destroyWindow('Select The Complete and Exact Boundaries of Graph')

#Enter the min and max values from the source graph here
y_min,y_max = 610, 680
x_min,x_max = 0, 3500

#Extract the curve points on the image
curve = extractCurve(graph)

#Map curve (x,y) pixel points to actual data points from graph
curve_normalized = [[int((cx/rw)*(x_max-x_min)+x_min),int((1-cy/rh)*(y_max-y_min)+y_min)] for cx,cy in curve]
curve_normalized = numpy.array(curve_normalized)

#Plot the newly constructed curve
pyplot.figure(figsize=(15,7))
pyplot.plot(curve_normalized[:,0],curve_normalized[:,1],'o-',linewidth=3)
pyplot.title('Curve Re-Constructed')
pyplot.grid(True)
pyplot.show()

I cant explain the complete code here but these are the keypoints:

  • profile_interval in the extractCurve() function is the number of pixel between to consecutive profiles. For better approximation you may keep it small.

  • smoothing in findSinks() function is added to prevent the noise or gridlines to be slected as candidate curve points. Difference less then smoothing will be considered 0, so that the sharp changes are detected as curve points.

  • Finally if your images are cleaner then the one you added here the curve extraction will be better.

Also feel free to ask further explanations.

Upvotes: 1

Related Questions