Ison
Ison

Reputation: 403

How to calculate percentage properly

I have three dataframes that have column "City". All three dataframes have a different set of city names.

I want to find the percentage of total matches between this column of each dataframe.

For this purpose I used set method and got three arrays

set1 = set(df1['City'])
set2 = set(df2['City'])
set3 = set(df3['City'])

But how should I find the percentage? I used these functions, but I'm not sure I did everything right

(len(set1) - len(set2))/len(set1)*100
(len(set1) - len(set3))/len(set1)*100
(len(set2) - len(set3))/len(set2)*100

Is this record right?

Upvotes: 0

Views: 54

Answers (2)

Claire
Claire

Reputation: 17

From the pure mathimatical side of things: I assume that you want to find the percentage of cities matching between respectively set1 & set2, set1 & set3 and set2 & set3.

To calculate this percentage, you need to find the number of matches and the length of the set of cities compared.

Then the percentage can be calculated as follows:

Percentage match 1 & 2 = [(number of matches between 1 & 2)/(length of the set)]*100

For the code side of things: i agree with Sparkofska.

Upvotes: 0

Sparkofska
Sparkofska

Reputation: 1320

You probably want this:

percentage = ( len(set1.intersection(set2)) / len(set1.union(set2)) )*100

which gives you the percentage of common elements in set1 and set2.

This is also known as Jaccard Index, a measurement for similarity of sets.

Upvotes: 1

Related Questions