Reputation: 11
I have created figures similar to this one here:
My goal here is to take each blue point and calculate the shortest distance it would take to get to any point on the red line. Ideally, this could be used to select the x% closest points or those falling within a certain distance, but the primary issue here is calculating each distance in the first place.
The points were taken from a data file and plotted as such:
data = np.loadtxt('gr.dat') ... ax.scatter(data[:,0],data[:,1])
whereas the red line is a calculated Baraffe track where all points used to create the line were stored in a dat file and plotted via:
df=pd.read_csv('baraffe.dat', sep="\s+", names= ['mass', 'age', 'g', 'r', 'i'])
df2 = pd.DataFrame(df, columns=["mass", "age", "g", "r", "i"])
df2['b_color'] = df2['g'] - df2['r']
df2.plot(ax=ax, x='b_color',y='g', color="r")
...`
This is my first attempt at using pandas so I know my code could definitely be optimized and is likely redundant, but it does output the figure attached.
Essentially, I want to calculate the smallest distance each dot would have to move (in both x and y) to reach any point on the red line. I did try and mimic the answer in (here) but I am unsure how to apply that definition to a dataframe or larger array without always getting a TypeError. If there is any insight to this I would greatly appreciate it, and thank you!
Upvotes: 1
Views: 220
Reputation: 25043
Use scipy.spatial.KDTree
.
Once you have built the KDTree on the points of the Baraffe track, you can use the different methods of the KDTree instance to compute all the quantities that are interesting you.
Here, for simplicity, I have just shown how to use the query
method to build a 1—1 correspondence between most-neighboring points.
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import KDTree
np.random.seed(20230307)
x = np.linspace(0, 10, 51)
y = np.sin(x)*0.7
x, y = +x*0.6+y*0.8, -0.8*x+0.6*y
xp = np.linspace(1, 9, 21)
yp = -1+np.random.rand(21)*0.4
xp, yp = +xp*0.6+yp*0.8, -0.8*xp+0.6*yp
kdt = KDTree(np.vstack((x, y)).T) # the array that is indexed must be N×2
distances, indices = kdt.query(np.vstack((xp, yp)).T, k=1)
fig, ax = plt.subplots()
ax.set_aspect(1)
ax.plot(x, y, color='k', lw=0.8)
ax.scatter(xp, yp, color='r')
for x0, y0, i in zip(xp, yp, indices):
plt.plot((x0, x[i]), (y0, y[i]), color='g', lw=0.5)
plt.show()
Upvotes: 1