Reputation: 31
I am aware that Dedupe uses Active learning to remove duplicates and perform Record linkage.
However , I would like to know if we can pass excel sheet with already matched pairs(label data) as the input for active learning?
Upvotes: 3
Views: 1180
Reputation: 3249
Not directly.
You'll need to get your data into a format that markPairs
can consume.
Something like:
labeled_examples = {'match' : [],
'distinct' : [({'name' : 'Georgie Porgie'},
{'name' : 'Georgette Porgette'})]
}
deduper.markPairs(labeled_examples)
We do provide a convenience function for getting spreadsheet data into this format trainingDataDedupe
.
(I am an author of dedupe)
Upvotes: 2