usct01
usct01

Reputation: 898

How to use pre labeled training data for Python Dedupe

I am using Python Dedupe package for record linkage tasks. It means matching Company names in one data set to other.

The Dedupe package allows user to label pairs for training Logistic Regression model. However, it's a manual process and one need to input y/n for each pair shown on screen.

I want to load a training file which has 3 columns say, Company 1, Company 2, Match Where Match can take value yes or no if Company 1 and Company 2 are same or different respectively.

I am following this source code but couldn't find a way to load a file for training.

Also, the doc shows one can change default Classifier but not sure how to do this

Can anyone please help me on this

Upvotes: 2

Views: 891

Answers (1)

Philip Cooper
Philip Cooper

Reputation: 1

Look up the trainingDataLink function in the dedupe documentation. It’s designed to handle pre-labeled data for record linkage.

Upvotes: 0

Related Questions