Parthapratim Neog
Parthapratim Neog

Reputation: 4478

How to find Mahalanobis distance between two 1D arrays in Python?

I have two 1D arrays, and I need to find out the Mahalanobis distance between them.

Array 1

-0.125510275,0.067021735,0.140631825,-0.014300184,-0.122152582,0.002372072,-0.050777748,-0.106606245,0.149123222,-0.159149423,0.210138127,0.031959131,-0.068411253,-0.038253143,-0.024590122,0.101361006,-0.160774037,-0.183688596,-0.07163775,-0.096662685,-0.000117288,0.14251323,-0.030461289,-0.006710192,-0.217195332,-0.338565469,-0.030219197,-0.100772612,0.144092739,-0.092911556,-0.008420993,0.042907588,-0.212668449,-0.009366207,-7.01E-05,0.134508118,-0.015715659,-0.050884761,0.18804647,0.04946585,-0.242626131,0.099951334,0.053660966,0.275807977,0.216019884,-0.009127878,0.019819722,-0.043750495,0.12940146,-0.259942383,0.061821692,0.107142501,0.098196507,0.022301452,0.079412982,-0.131031215,-0.049483716,0.126781181,-0.195536733,0.077051811,0.061049294,-0.039563753,0.02573989,0.025330214,0.204785526,0.099218346,-0.050533134,-0.109173119,0.205652237,-0.168003649,-0.062734045,0.100320764,-0.063513778,-0.120843001,-0.223983109,0.075016715,0.481291831,0.107607022,-0.141365036,0.075003348,-0.042418435,-0.041501854,0.096700639,0.083469011,-0.033227846,-0.050748199,-0.045331556,0.065955319,0.26927036,0.082820699,-0.014033476,0.176714703,0.042264186,-0.011814327,0.041769091,-0.00132945,-0.114337325,-0.013483777,-0.111367472,-0.051828772,-0.022199111,0.030011443,0.015529033,0.171916366,-0.172722578,0.214662731,-0.0219073,-0.067695767,0.040487193,0.04814541,0.003313571,-0.01360167,0.115932293,-0.235844463,0.185181856,0.130868644,0.010789306,0.171733275,0.059378762,0.003508842,0.039326921,0.024174646,-0.195897669,-0.088932432,0.025385177,-0.134177506,0.08158315,0.049005955

And, Array 2

-0.120652862,0.030241199,0.146165773,-0.044423241,-0.138606027,-0.048646796,-0.00780057,-0.101798892,0.185339138,-0.210505784,0.1637595,0.015000292,-0.10359703,0.102251172,-0.043159217,0.183324724,-0.171825036,-0.173819616,-0.112194099,-0.161590934,-0.002507193,0.163269699,-0.037766434,0.041060638,-0.178659558,-0.268946916,-0.055348843,-0.11808344,0.113775767,-0.073903576,-0.039505914,0.032382272,-0.159118786,0.007761603,0.057116233,0.043675732,-0.057895001,-0.104836114,0.22844176,0.055832602,-0.245030299,0.006276659,0.140012532,0.21449241,0.159539059,-0.049584024,0.016899824,-0.074179329,0.119686954,-0.242336214,-0.001390997,0.097442642,0.059720818,0.109706804,0.073196828,-0.16272822,0.022305552,0.102650747,-0.192103565,0.104134969,0.099571452,-0.101140082,-0.038911857,0.071292967,0.202927336,0.12729995,-0.047885433,-0.165100336,0.220239595,-0.19612211,-0.075948663,0.096906625,-0.07410948,-0.108219706,-0.155030385,-0.042231761,0.484629512,0.093194947,-0.105109185,0.072906494,-0.056871444,-0.057923764,0.101847053,0.092042476,-0.061295755,-0.031595342,-0.01854251,0.074671492,0.266587347,0.052284949,0.003548023,0.171518356,0.053180017,-0.022400264,0.061757766,0.038441688,-0.139473096,-0.05759665,-0.101672307,-0.074863717,-0.02349415,-0.011674869,0.010008151,0.141401738,-0.190440938,0.216421023,-0.028323224,-0.078021556,-0.011468113,0.100600921,-0.019697987,-0.014288296,0.114862509,-0.162037179,0.171686187,0.149788797,-0.01235011,0.136169329,0.008751356,0.024811052,0.003802934,0.00500867,-0.1840965,-0.086204343,0.018549766,-0.110649876,0.068768717,0.03012047

I found that Scipy has already implemented the function. However, I am confused about what the value of IV should be. I tried to do the following

V = np.cov(np.array([array_1, array_2]))
IV = np.linalg.inv(V)
print(mahalanobis(array_1, array_2, IV))

But, I get the following error:

File "C:\Users\XXXXXX\AppData\Local\Continuum\anaconda3\envs\face\lib\site-packages\scipy\spatial\distance.py", line 1043, in mahalanobis m = np.dot(np.dot(delta, VI), delta)

ValueError: shapes (128,) and (2,2) not aligned: 128 (dim 0) != 2 (dim 0)

EDIT:

array_1 = [-0.10577646642923355, 0.09617947787046432, 0.029290344566106796, 0.02092641592025757, -0.021434104070067406, -0.13410840928554535, 0.028282659128308296, -0.12082239985466003, 0.21936850249767303, -0.06512433290481567, 0.16812698543071747, -0.03302834928035736, -0.18088334798812866, -0.04598559811711311, -0.014739632606506348, 0.06391328573226929, -0.15650317072868347, -0.13678401708602905, 0.01166679710149765, -0.13967938721179962, 0.14632365107536316, 0.025218486785888672, 0.046839646995067596, 0.09690812975168228, -0.13414686918258667, -0.2883925437927246, -0.1435326784849167, -0.17896348237991333, 0.10746842622756958, -0.09142691642045975, 0.04860316216945648, 0.031577128916978836, -0.17280976474285126, -0.059613555669784546, -0.05718057602643967, 0.0401446670293808, 0.026440180838108063, -0.017025159671902657, 0.22091664373874664, 0.024703698232769966, -0.15607595443725586, -0.0018572667613625526, -0.037675946950912476, 0.3210170865058899, 0.10884962230920792, 0.030370134860277176, 0.056784629821777344, -0.030112050473690033, 0.023124486207962036, -0.1449904441833496, 0.08885903656482697, 0.17527811229228973, 0.08804896473884583, 0.038310401141643524, -0.01704210229218006, -0.17355971038341522, -0.018237406387925148, 0.030551932752132416, -0.23085585236549377, 0.13475817441940308, 0.16338199377059937, -0.06968289613723755, -0.04330683499574661, 0.04434924200177193, 0.22637797892093658, 0.07463733851909637, -0.15070196986198425, -0.07500549405813217, 0.10863590240478516, -0.22288714349269867, 0.0010778247378766537, 0.057608842849731445, -0.12828609347343445, -0.17236559092998505, -0.23064571619033813, 0.09910193085670471, 0.46647992730140686, 0.0634111613035202, -0.13985536992549896, 0.052741192281246185, -0.1558966338634491, 0.022585246711969376, 0.10514408349990845, 0.11794176697731018, -0.06241249293088913, 0.06389056891202927, -0.14145469665527344, 0.060088545083999634, 0.09667345881462097, -0.004665130749344826, -0.07927791774272919, 0.21978208422660828, -0.0016187895089387894, 0.04876316711306572, 0.03137822449207306, 0.08962501585483551, -0.09108036011457443, -0.01795950159430504, -0.04094596579670906, 0.03533276170492172, 0.01394269522279501, -0.08244197070598602, -0.05095399543642998, 0.04305890575051308, -0.1195211187005043, 0.16731074452400208, 0.03894471749663353, -0.0222858227789402, -0.07944411784410477, 0.0614166259765625, -0.1481470763683319, -0.09113290905952454, 0.14758692681789398, -0.24051085114479065, 0.164126917719841, 0.1753545105457306, -0.003193420823663473, 0.20875433087348938, 0.03357946127653122, 0.1259773075580597, -0.00022807717323303223, -0.039092566817998886, -0.13582147657871246, -0.01937306858599186, 0.015938198193907738, 0.00787206832319498, 0.05792934447526932, 0.03294186294078827]
array_2 = [-0.1966051608324051, 0.0940953716635704, -0.0031937970779836178, -0.03691547363996506, -0.07240629941225052, -0.07114037871360779, -0.07133384048938751, -0.1283963918685913, 0.15377545356750488, -0.091400146484375, 0.10803385823965073, -0.09235749393701553, -0.1866973638534546, -0.021168243139982224, -0.09094691276550293, 0.07300164550542831, -0.20971564948558807, -0.1847742646932602, -0.009817334823310375, -0.05971141159534454, 0.09904412180185318, 0.0278592761605978, -0.012554554268717766, 0.09818517416715622, -0.1747943013906479, -0.31632938981056213, -0.0864541232585907, -0.13249783217906952, 0.002135572023689747, -0.04935726895928383, 0.010047778487205505, 0.04549024999141693, -0.26334646344184875, -0.05263081565499306, -0.013573898002505302, 0.2042253464460373, 0.06646320968866348, 0.08540669083595276, 0.12267164140939713, -0.018634958192706108, -0.19135263562202454, 0.01208433136343956, 0.09216200560331345, 0.2779296934604645, 0.1531585156917572, 0.10681629925966263, -0.021275708451867104, -0.059720948338508606, 0.06610126793384552, -0.21058350801467896, 0.005440462380647659, 0.18833838403224945, 0.08883830159902573, 0.025969548150897026, 0.0337764173746109, -0.1585341989994049, 0.02370697632431984, 0.10416869819164276, -0.19022507965564728, 0.11423652619123459, 0.09144753962755203, -0.08765758574008942, -0.0032832929864525795, -0.0051014479249715805, 0.19875964522361755, 0.07349056005477905, -0.1031823456287384, -0.10447365045547485, 0.11358538269996643, -0.24666038155555725, -0.05960353836417198, 0.07124857604503632, -0.039664581418037415, -0.20122921466827393, -0.31481748819351196, -0.006801256909966469, 0.41940364241600037, 0.1236235573887825, -0.12495145946741104, 0.12580059468746185, -0.02020396664738655, -0.03004150651395321, 0.11967054009437561, 0.09008713811635971, -0.07470540702342987, 0.09324200451374054, -0.13763070106506348, 0.07720538973808289, 0.19568027555942535, 0.036567769944667816, 0.030284458771348, 0.14119629561901093, -0.03820852190256119, 0.06232285499572754, 0.036639824509620667, 0.07704029232263565, -0.12276224792003632, -0.0035170004703104496, -0.13103705644607544, 0.027697769924998283, -0.01527332328259945, -0.04027168080210686, -0.03659897670149803, 0.03330300375819206, -0.12293602526187897, 0.09043421596288681, -0.019673841074109077, -0.07563626766204834, -0.13991905748844147, 0.014788001775741577, -0.07630413770675659, 0.00017269013915210962, 0.16345393657684326, -0.25710681080818176, 0.19869503378868103, 0.19393865764141083, -0.07422225922346115, 0.19553625583648682, 0.09189949929714203, 0.051557887345552444, -0.0008843056857585907, -0.006250975653529167, -0.1680600494146347, -0.10320111364126205, 0.03232177346944809, -0.08931156992912292, 0.11964476853609085, 0.00814182311296463]

The co-variance matrix of the above arrays turn out to be a singular matrix, and thus I am unable to inverse it. Why does it end up being a singular matrix?

EDIT 2: Solution

Since the co-variance matrix here is singular matrix, I had to pseudo inverse it using np.linalg.pinv(V).

Upvotes: 5

Views: 8011

Answers (1)

tel
tel

Reputation: 13999

From the numpy.cov docs, the first argument should be an array m such that:

Each row of m represents a variable, and each column a single observation of all those variables.

So to fix your code just take the transpose (with .T) of your array before you call cov:

V = np.cov(np.array([array_1, array_2]).T)
IV = np.linalg.inv(V)
print(mahalanobis(array_1, array_2, IV))

I just tested this out on some random data, and I can confirm it works.

Also, calculating covariance from just two observations is a bad idea, and not likely to be very accurate. If your data is coming from an image, you should use the entire image img (or at least the entire region of interest) when calculating the covariance matrix, then use that matrix to find the Mahalanobis distance between the two vectors of interest:

V = np.cov(np.array(img))
IV = np.linalg.inv(V)
print(mahalanobis(array_1, array_2, IV))

You may or may not need to replace img with img.T, depending on how you generated array_1 and array_2 in the first place.

If you're getting singular covariance matrices, what you have is a math problem, not a code problem. It's apparently a common enough problem that the question "why is my covariance matrix singular?" has already been asked and answered. Very broadly, it seems like it can happen when enough of your data points are "too similar", in some sense. I'd imagine using just two data points also makes this more likely.

Upvotes: 4

Related Questions