H35am
H35am

Reputation: 818

Calculating Manhattan distance in Python without result

I have these two data frames in python and I'm trying to calculate the Manhattan distance and later on the Euclidean distance, but I'm stuck in this Manhattan distance and can't figure it out what is going wrong.
Here is what I have tried so far:

ratings = pd.read_csv("toy_ratings.csv", ",")
person1 = ratings[ratings['Person'] == 1]['Rating']
person2 = ratings[ratings['Person'] == 2]['Rating']

ratings.head()
    Person Movie Rating
0   1      11   2.5
1   1      12   3.5
2   1      15   2.5
3   3      14   3.5
4   2      12   3.5

Here is data inside the person1 and person2

print("*****person1*****")
print(person1)

*****person1*****
0     2.5
1     3.5
2     2.5
5     3.0
22    3.5
23    3.0
36    5.0

print("*****person2*****")
print(person2)

*****person2*****
4     3.5
6     3.0
8     1.5
9     5.0
11    3.0
24    3.5

This was the function that I have tried to build without any luck:

def ManhattanDist(person1, person2):
    distance = 0
    for rating in person1:
        if rating in person2:
            distance += abs(person1[rating] - person2[rating])
            return distance

The thing is that the function gives 0 back and this is not correct, when I debug I can see that it never enters the second loop. How can I perform a check to see the both rows has a value and loop?

Upvotes: 0

Views: 6164

Answers (2)

pyano
pyano

Reputation: 1978

I think the function should give back (= return) the distance in any case: either the distance is zero as initiated, or it is is somethhing else. So the function should look like

def ManhattanDist(person1, person2):
    distance = 0
    for rating in person1:
        if rating in person2:
            distance += abs(person1[rating] - person2[rating])
    return distance

I think the distance should be built by two vectors of the same length (at least I cannot imagine any thing else). If this is the case you can do (without your function)

import numpy as np

p1 = np.array(person1)
p2 = np.array(person2)

#--- scalar product as similarity indicator
dist1 = np.dot(p1,p2)

#--- Euclidean distance
dist2 = np.linalg.norm(p1-p2)

#--- manhatten distance
dist3 = np.sum(np.abs(p1-p2))

Upvotes: 2

Tim Seed
Tim Seed

Reputation: 5279

You function is returning 1 value ... It should (I guess) return a list of values.

Upvotes: 0

Related Questions