Reputation: 7227
We have a data frame with a sorted float index and two columns that should be the same. Their values are not always present, and in the worst case scenario, they do not have overlaps in the index values. The goal is to be able to check how far they are from each other.
I was thinking about interpolating the missing values and then calculating the distance. This would result in a large collection of index values for which this distance can be calculated.
Another approach would be to compare the actual values, and come up with an index error for which this comparison would make sense.
The question is which approach would make more sense and how to calculate the distance. The result should tell us how close they are to each other, with f.e. 0
meaning that they are the same.
Example
We have a data frame with two columns a1
and a2
and a sorted, float index.
df = pd.DataFrame({'a1':[6.1, np.nan, 6.8, 7.5, 7.9],
'a2':[6.2, 6.6, 6.8, np.nan, 7.7]},
index=[0.10, 0.11, 0.13, 0.16, 0.17])
a1 a2
0.10 6.1 6.2
0.11 NaN 6.6
0.13 6.8 6.8
0.16 7.5 NaN
0.17 7.9 7.7
Upvotes: 0
Views: 707
Reputation: 2545
If your objective is to get the absolute distance of the interpolated vectors you can proceed as follows:
r = pd.interpolate()
absolute_sum = (r["a1"] - r["a2"]).abs().sum()
With the given example the result is 0.7000000000000011
.
Though if you are interested on how similar the two columns are you could take a look into the correlation coefficient.
r = pd.interpolate()
correlation = r["a1"].corr("a2")
With the given example the result is 0.9929580338258082
.
Upvotes: 1
Reputation: 323226
Since you mention distance
from scipy.spatial import distance
df=df.interpolate(axis=0)
pd.DataFrame(distance.cdist(df.values, df.values, 'euclidean'),columns=df.index,index=df.index)
Out[468]:
0.10 0.11 0.13 0.16 0.17
0.10 0.000000 0.531507 0.921954 1.750000 2.343075
0.11 0.531507 0.000000 0.403113 1.234909 1.820027
0.13 0.921954 0.403113 0.000000 0.832166 1.421267
0.16 1.750000 1.234909 0.832166 0.000000 0.602080
0.17 2.343075 1.820027 1.421267 0.602080 0.000000
Upvotes: 0