Python - Interpolation of plots

Question

For my evaluation, I have used gnuplot to plot data from two separate csv files (found in this link: https://drive.google.com/open?id=0B2Iv8dfU4fTUZGV6X1Bvb3c4TWs) with a different number of rows which generates the following graph.

These data seem to have no common timestamp (the first column) in both csv files and yet gnuplot seems to fit the plotting as shown above.

Here is the gnuplot script that I use to generate my plot.

# ###### GNU Plot

set style data lines
set terminal postscript eps enhanced color "Times" 20

set output "output.eps"

set title "Actual vs. Estimated Comparison"

set style line 99 linetype 1 linecolor rgb "#999999" lw 2
#set border 1 back ls 11
set key right top
set key box linestyle 50
set key width -2
set xrange [0:10]
set key spacing 1.2
#set nokey

set grid xtics ytics mytics
#set size 2
#set size ratio 0.4

#show timestamp
set xlabel "Time [Seconds]"
set ylabel "Segments"

set style line 1 lc rgb "#ff0000" lt 1 pi 0 pt 4 lw 4 ps 0

plot  "estimated.csv" using ($1):2 with lines title "Estimated", "actual.csv" using ($1):2 with lines title "Actual";

I wanted to interpolate my green line into the grid where my pink line is defined, then compare the two. Here is my initial approach

#!/usr/bin/env python
import sys

import numpy as np
from shapely.geometry import LineString
#-------------------------------------------------------------------------------
def load_data(fname):
    return LineString(np.genfromtxt(fname, delimiter = ','))
#-------------------------------------------------------------------------------
lines = list(map(load_data, sys.argv[1:]))

for g in lines[0].intersection(lines[1]):
    if g.geom_type != 'Point':
        continue
    print('%f,%f' % (g.x, g.y))
Then in Gnuplot, one can invoke it directly:

set terminal pngcairo
set output 'fig.png'

set datafile separator comma
set yr [0:700]
set xr [0:10]

set xtics 0,2,10
set ytics 0,100,700

set grid

set xlabel "Time [seconds]"
set ylabel "Segments"

plot \
    'estimated.csv' w l lc rgb 'dark-blue' t 'Estimated', \
    'actual.csv' w l lc rgb 'green' t 'Actual', \
    '



which gives us the following plot



I wrote the filtered points to another file (filtered_points.csv found in this link:https://drive.google.com/open?id=0B2Iv8dfU4fTUSHVOMzYySjVzZWc) from this script. However, the filtered points are less than 10% of the actual dataset (which is the ground truth).

Is there any way where we can interpolate the two lines by ignoring the pink high peaks above the green plot using python? Gnuplot doesn't seem to be the best tool for this. If the pink line doesn't touch the green line (i.e. if it is way below the green line), I want to take the values of the closest green line so that it will be a one-to-one correspondence (or very close) with the actual dataset. I want to return the interpolated values for the green line in the pink line grid so that we can compare both lines since they have the same array size.

Franz · Accepted Answer

Getting the same data size in terms of an interpolation is pretty simple by numpy.interp(). For me, this code works:

import numpy as np
import matplotlib.pyplot as plt

names = ['actual.csv','estimated.csv']
#-------------------------------------------------------------------------------
def load_data(fname):
    return np.genfromtxt(fname, delimiter = ',')
#-------------------------------------------------------------------------------

data = [load_data(name) for name in names]
actual_data = data[0]
estimated_data = data[1]
interpolated_estimation = np.interp(estimated_data[:,0],actual_data[:,0],actual_data[:,1])

plt.figure()
plt.plot(actual_data[:,0],actual_data[:,1], label='actual')
plt.plot(estimated_data[:,0],estimated_data[:,1], label='estimated')
plt.plot(estimated_data[:,0],interpolated_estimation, label='interpolated')
plt.legend()
plt.show(block=True)

After this interpolation interpolated_estimation has the same size as the x axis of actual_data, as the plot suggests. The slicing is a bit confusing but I tried to use your function and make the plot calls as clear as possible.

To save to a file and plot like suggested I changed the code to:

import numpy as np
import matplotlib.pyplot as plt

names = ['actual.csv','estimated.csv']
#-------------------------------------------------------------------------------
def load_data(fname):
    return np.genfromtxt(fname, delimiter = ',')
#-------------------------------------------------------------------------------

data = [load_data(name) for name in names]
actual_data = data[0]
estimated_data = data[1]
interpolated_estimation = np.interp(estimated_data[:,0],actual_data[:,0],actual_data[:,1])

plt.figure()
plt.plot(actual_data[:,0],actual_data[:,1], label='actual')
#plt.plot(estimated_data[:,0],estimated_data[:,1], label='estimated')
plt.plot(estimated_data[:,0],interpolated_estimation, label='interpolated')
np.savetxt('interpolated.csv',
       np.vstack((estimated_data[:,0],interpolated_estimation)).T,
       delimiter=',', fmt='%10.5f') #saves data to filedata to file
plt.legend()
plt.title('Actual vs. Interpolated')
plt.xlim(0,10)
plt.ylim(0,500)
plt.xlabel('Time [Seconds]')
plt.ylabel('Segments')
plt.grid()
plt.show(block=True)

This produces the following output:

Python - Interpolation of plots

Answers (1)

Related Questions