Reputation: 898
I am using Python Dedupe package for record linkage tasks. It means matching Company names in one data set to other.
The Dedupe package allows user to label pairs for training Logistic Regression model. However, it's a manual process and one need to input y/n for each pair shown on screen.
I want to load a training file which has 3 columns say, Company 1, Company 2, Match Where Match can take value yes or no if Company 1 and Company 2 are same or different respectively.
I am following this source code but couldn't find a way to load a file for training.
Also, the doc shows one can change default Classifier but not sure how to do this
Can anyone please help me on this
Upvotes: 2
Views: 891
Reputation: 1
Look up the trainingDataLink
function in the dedupe documentation. It’s designed to handle pre-labeled data for record linkage.
Upvotes: 0