P H
P H

Reputation: 344

Which algorithm should I use to match the pattern or finding intersection between datasets?

I have personID and VaccinationsID plotted in x and y axis. I want to group those personIDs who have the most similar selection of vaccinations. I am trying to use clustering machine learning algorithm. But I am not sure whether I should use this algorithm or user collaborative filtering.

My aim is to achieve Jaccard indexing, that is finding the intersection or similarities between 10000s of persons and form clusters and label them. Based on the degree of similarities, I need to group the personsID. Could anyone tell me which is an efficient approach? also if it is feasible to do using clustering for millions of data

I have added the screenshot of the graph

Upvotes: 0

Views: 165

Answers (2)

P H
P H

Reputation: 344

After a lot of analysis, I used K-modes clustering algorithm. Based on the dissimilarity, the clusters are formed. Below is the link to the video of how the K-modes algorithm work. [https://www.youtube.com/watch?v=b39_vipRkUo]

Upvotes: 0

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77495

Number of vaccinations is an integer.

Just partition your data by this value, no need for clustering.

Everybody that has 7 vaccination goes into list 7.

Upvotes: 0

Related Questions