Reputation: 4478
I have two 1D arrays, and I need to find out the Mahalanobis distance between them.
Array 1
-0.125510275,0.067021735,0.140631825,-0.014300184,-0.122152582,0.002372072,-0.050777748,-0.106606245,0.149123222,-0.159149423,0.210138127,0.031959131,-0.068411253,-0.038253143,-0.024590122,0.101361006,-0.160774037,-0.183688596,-0.07163775,-0.096662685,-0.000117288,0.14251323,-0.030461289,-0.006710192,-0.217195332,-0.338565469,-0.030219197,-0.100772612,0.144092739,-0.092911556,-0.008420993,0.042907588,-0.212668449,-0.009366207,-7.01E-05,0.134508118,-0.015715659,-0.050884761,0.18804647,0.04946585,-0.242626131,0.099951334,0.053660966,0.275807977,0.216019884,-0.009127878,0.019819722,-0.043750495,0.12940146,-0.259942383,0.061821692,0.107142501,0.098196507,0.022301452,0.079412982,-0.131031215,-0.049483716,0.126781181,-0.195536733,0.077051811,0.061049294,-0.039563753,0.02573989,0.025330214,0.204785526,0.099218346,-0.050533134,-0.109173119,0.205652237,-0.168003649,-0.062734045,0.100320764,-0.063513778,-0.120843001,-0.223983109,0.075016715,0.481291831,0.107607022,-0.141365036,0.075003348,-0.042418435,-0.041501854,0.096700639,0.083469011,-0.033227846,-0.050748199,-0.045331556,0.065955319,0.26927036,0.082820699,-0.014033476,0.176714703,0.042264186,-0.011814327,0.041769091,-0.00132945,-0.114337325,-0.013483777,-0.111367472,-0.051828772,-0.022199111,0.030011443,0.015529033,0.171916366,-0.172722578,0.214662731,-0.0219073,-0.067695767,0.040487193,0.04814541,0.003313571,-0.01360167,0.115932293,-0.235844463,0.185181856,0.130868644,0.010789306,0.171733275,0.059378762,0.003508842,0.039326921,0.024174646,-0.195897669,-0.088932432,0.025385177,-0.134177506,0.08158315,0.049005955
And, Array 2
-0.120652862,0.030241199,0.146165773,-0.044423241,-0.138606027,-0.048646796,-0.00780057,-0.101798892,0.185339138,-0.210505784,0.1637595,0.015000292,-0.10359703,0.102251172,-0.043159217,0.183324724,-0.171825036,-0.173819616,-0.112194099,-0.161590934,-0.002507193,0.163269699,-0.037766434,0.041060638,-0.178659558,-0.268946916,-0.055348843,-0.11808344,0.113775767,-0.073903576,-0.039505914,0.032382272,-0.159118786,0.007761603,0.057116233,0.043675732,-0.057895001,-0.104836114,0.22844176,0.055832602,-0.245030299,0.006276659,0.140012532,0.21449241,0.159539059,-0.049584024,0.016899824,-0.074179329,0.119686954,-0.242336214,-0.001390997,0.097442642,0.059720818,0.109706804,0.073196828,-0.16272822,0.022305552,0.102650747,-0.192103565,0.104134969,0.099571452,-0.101140082,-0.038911857,0.071292967,0.202927336,0.12729995,-0.047885433,-0.165100336,0.220239595,-0.19612211,-0.075948663,0.096906625,-0.07410948,-0.108219706,-0.155030385,-0.042231761,0.484629512,0.093194947,-0.105109185,0.072906494,-0.056871444,-0.057923764,0.101847053,0.092042476,-0.061295755,-0.031595342,-0.01854251,0.074671492,0.266587347,0.052284949,0.003548023,0.171518356,0.053180017,-0.022400264,0.061757766,0.038441688,-0.139473096,-0.05759665,-0.101672307,-0.074863717,-0.02349415,-0.011674869,0.010008151,0.141401738,-0.190440938,0.216421023,-0.028323224,-0.078021556,-0.011468113,0.100600921,-0.019697987,-0.014288296,0.114862509,-0.162037179,0.171686187,0.149788797,-0.01235011,0.136169329,0.008751356,0.024811052,0.003802934,0.00500867,-0.1840965,-0.086204343,0.018549766,-0.110649876,0.068768717,0.03012047
I found that Scipy has already implemented the function. However, I am confused about what the value of IV should be. I tried to do the following
V = np.cov(np.array([array_1, array_2]))
IV = np.linalg.inv(V)
print(mahalanobis(array_1, array_2, IV))
But, I get the following error:
File "C:\Users\XXXXXX\AppData\Local\Continuum\anaconda3\envs\face\lib\site-packages\scipy\spatial\distance.py", line 1043, in mahalanobis m = np.dot(np.dot(delta, VI), delta)
ValueError: shapes (128,) and (2,2) not aligned: 128 (dim 0) != 2 (dim 0)
EDIT:
array_1 = [-0.10577646642923355, 0.09617947787046432, 0.029290344566106796, 0.02092641592025757, -0.021434104070067406, -0.13410840928554535, 0.028282659128308296, -0.12082239985466003, 0.21936850249767303, -0.06512433290481567, 0.16812698543071747, -0.03302834928035736, -0.18088334798812866, -0.04598559811711311, -0.014739632606506348, 0.06391328573226929, -0.15650317072868347, -0.13678401708602905, 0.01166679710149765, -0.13967938721179962, 0.14632365107536316, 0.025218486785888672, 0.046839646995067596, 0.09690812975168228, -0.13414686918258667, -0.2883925437927246, -0.1435326784849167, -0.17896348237991333, 0.10746842622756958, -0.09142691642045975, 0.04860316216945648, 0.031577128916978836, -0.17280976474285126, -0.059613555669784546, -0.05718057602643967, 0.0401446670293808, 0.026440180838108063, -0.017025159671902657, 0.22091664373874664, 0.024703698232769966, -0.15607595443725586, -0.0018572667613625526, -0.037675946950912476, 0.3210170865058899, 0.10884962230920792, 0.030370134860277176, 0.056784629821777344, -0.030112050473690033, 0.023124486207962036, -0.1449904441833496, 0.08885903656482697, 0.17527811229228973, 0.08804896473884583, 0.038310401141643524, -0.01704210229218006, -0.17355971038341522, -0.018237406387925148, 0.030551932752132416, -0.23085585236549377, 0.13475817441940308, 0.16338199377059937, -0.06968289613723755, -0.04330683499574661, 0.04434924200177193, 0.22637797892093658, 0.07463733851909637, -0.15070196986198425, -0.07500549405813217, 0.10863590240478516, -0.22288714349269867, 0.0010778247378766537, 0.057608842849731445, -0.12828609347343445, -0.17236559092998505, -0.23064571619033813, 0.09910193085670471, 0.46647992730140686, 0.0634111613035202, -0.13985536992549896, 0.052741192281246185, -0.1558966338634491, 0.022585246711969376, 0.10514408349990845, 0.11794176697731018, -0.06241249293088913, 0.06389056891202927, -0.14145469665527344, 0.060088545083999634, 0.09667345881462097, -0.004665130749344826, -0.07927791774272919, 0.21978208422660828, -0.0016187895089387894, 0.04876316711306572, 0.03137822449207306, 0.08962501585483551, -0.09108036011457443, -0.01795950159430504, -0.04094596579670906, 0.03533276170492172, 0.01394269522279501, -0.08244197070598602, -0.05095399543642998, 0.04305890575051308, -0.1195211187005043, 0.16731074452400208, 0.03894471749663353, -0.0222858227789402, -0.07944411784410477, 0.0614166259765625, -0.1481470763683319, -0.09113290905952454, 0.14758692681789398, -0.24051085114479065, 0.164126917719841, 0.1753545105457306, -0.003193420823663473, 0.20875433087348938, 0.03357946127653122, 0.1259773075580597, -0.00022807717323303223, -0.039092566817998886, -0.13582147657871246, -0.01937306858599186, 0.015938198193907738, 0.00787206832319498, 0.05792934447526932, 0.03294186294078827]
array_2 = [-0.1966051608324051, 0.0940953716635704, -0.0031937970779836178, -0.03691547363996506, -0.07240629941225052, -0.07114037871360779, -0.07133384048938751, -0.1283963918685913, 0.15377545356750488, -0.091400146484375, 0.10803385823965073, -0.09235749393701553, -0.1866973638534546, -0.021168243139982224, -0.09094691276550293, 0.07300164550542831, -0.20971564948558807, -0.1847742646932602, -0.009817334823310375, -0.05971141159534454, 0.09904412180185318, 0.0278592761605978, -0.012554554268717766, 0.09818517416715622, -0.1747943013906479, -0.31632938981056213, -0.0864541232585907, -0.13249783217906952, 0.002135572023689747, -0.04935726895928383, 0.010047778487205505, 0.04549024999141693, -0.26334646344184875, -0.05263081565499306, -0.013573898002505302, 0.2042253464460373, 0.06646320968866348, 0.08540669083595276, 0.12267164140939713, -0.018634958192706108, -0.19135263562202454, 0.01208433136343956, 0.09216200560331345, 0.2779296934604645, 0.1531585156917572, 0.10681629925966263, -0.021275708451867104, -0.059720948338508606, 0.06610126793384552, -0.21058350801467896, 0.005440462380647659, 0.18833838403224945, 0.08883830159902573, 0.025969548150897026, 0.0337764173746109, -0.1585341989994049, 0.02370697632431984, 0.10416869819164276, -0.19022507965564728, 0.11423652619123459, 0.09144753962755203, -0.08765758574008942, -0.0032832929864525795, -0.0051014479249715805, 0.19875964522361755, 0.07349056005477905, -0.1031823456287384, -0.10447365045547485, 0.11358538269996643, -0.24666038155555725, -0.05960353836417198, 0.07124857604503632, -0.039664581418037415, -0.20122921466827393, -0.31481748819351196, -0.006801256909966469, 0.41940364241600037, 0.1236235573887825, -0.12495145946741104, 0.12580059468746185, -0.02020396664738655, -0.03004150651395321, 0.11967054009437561, 0.09008713811635971, -0.07470540702342987, 0.09324200451374054, -0.13763070106506348, 0.07720538973808289, 0.19568027555942535, 0.036567769944667816, 0.030284458771348, 0.14119629561901093, -0.03820852190256119, 0.06232285499572754, 0.036639824509620667, 0.07704029232263565, -0.12276224792003632, -0.0035170004703104496, -0.13103705644607544, 0.027697769924998283, -0.01527332328259945, -0.04027168080210686, -0.03659897670149803, 0.03330300375819206, -0.12293602526187897, 0.09043421596288681, -0.019673841074109077, -0.07563626766204834, -0.13991905748844147, 0.014788001775741577, -0.07630413770675659, 0.00017269013915210962, 0.16345393657684326, -0.25710681080818176, 0.19869503378868103, 0.19393865764141083, -0.07422225922346115, 0.19553625583648682, 0.09189949929714203, 0.051557887345552444, -0.0008843056857585907, -0.006250975653529167, -0.1680600494146347, -0.10320111364126205, 0.03232177346944809, -0.08931156992912292, 0.11964476853609085, 0.00814182311296463]
The co-variance matrix of the above arrays turn out to be a singular matrix, and thus I am unable to inverse it. Why does it end up being a singular matrix?
EDIT 2: Solution
Since the co-variance matrix here is singular matrix, I had to pseudo inverse it using np.linalg.pinv(V)
.
Upvotes: 5
Views: 8011
Reputation: 13999
From the numpy.cov
docs, the first argument should be an array m
such that:
Each row of m represents a variable, and each column a single observation of all those variables.
So to fix your code just take the transpose (with .T
) of your array before you call cov
:
V = np.cov(np.array([array_1, array_2]).T)
IV = np.linalg.inv(V)
print(mahalanobis(array_1, array_2, IV))
I just tested this out on some random data, and I can confirm it works.
Also, calculating covariance from just two observations is a bad idea, and not likely to be very accurate. If your data is coming from an image, you should use the entire image img
(or at least the entire region of interest) when calculating the covariance matrix, then use that matrix to find the Mahalanobis distance between the two vectors of interest:
V = np.cov(np.array(img))
IV = np.linalg.inv(V)
print(mahalanobis(array_1, array_2, IV))
You may or may not need to replace img
with img.T
, depending on how you generated array_1
and array_2
in the first place.
If you're getting singular covariance matrices, what you have is a math problem, not a code problem. It's apparently a common enough problem that the question "why is my covariance matrix singular?" has already been asked and answered. Very broadly, it seems like it can happen when enough of your data points are "too similar", in some sense. I'd imagine using just two data points also makes this more likely.
Upvotes: 4