Dedupe Library issue with csv file

Question

I am trying to learn dedupe library by running one very small example . I am getting some error . PLease help

import dedupe
from Levenshtein import distance
# Define similarity functions - customize based on your matching criteria
def name_similarity(s1, s2):
    # Implement your name comparison logic here (e.g., Levenshtein distance, etc.)
    distance1 = distance(s1, s2)
    similarity = 1 - (distance1 / max(len(s1), len(s2)))  # Normalize distance to 0-1 similarity
    return similarity


if __name__ == '__main__':
# Sample data (list of dictionaries)
    data = {18709931: {'id': '18709931', 'name': 'TEST', 'ent_num': '8256364', 'ent_nm_txt': 'TST Corporation'},
            18484906: {'id': '18484906', 'name': 'VESTCOM', 'ent_num': '8256364', 'ent_nm_txt': 'TST Corporation'},
            18709961: {'id': '18709961', 'name': 'TESTMATERIALS', 'ent_num': '8256364', 'ent_nm_txt': 'TST Corporation'},
            19415694: {'id': '19415694', 'name': 'TEST', 'ent_num': '8256364', 'ent_nm_txt': 'TST Corporation'}}




    # Define a schema
    fields = [
        {'field': 'name', 'type': 'Custom', 'comparator': name_similarity},
        {'field': 'ent_num', 'type': 'Exact'},

    ]

    # Initialize a deduper
    deduper = dedupe.Dedupe(fields)

    # Active learning loop to label examples
    deduper.prepare_training(data)

    # Active learning loop
    dedupe.console_label(deduper)

    # Train the deduper
    deduper.train()

    # Save the trained model to disk
    with open('dedupe_model.pickle', 'wb') as f:
        dedupe.pickle.dump(deduper, f)

error I am getting while running training

Traceback (most recent call last): File "C:\Python_Projects\Python_extra_code est_dedupe_code.py", line 30, in deduper.prepare_training(data) File "C:\Dev\Python3.11\Lib\site-packages\dedupe\api.py", line 1424, in prepare_training self.active_learner = labeler.DedupeDisagreementLearner( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 430, in init self.mark(examples, labels) File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 391, in mark learner.fit(self.pairs, self.y) File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 117, in fit self.current_predicates = self.block_learner.learn( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Dev\Python3.11\Lib\site-packages\dedupe raining.py", line 58, in learn coverable_dupes = frozenset.union(*match_cover.values()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: unbound method frozenset.union() needs an argument

Process finished with exit code 1

Dedupe Library issue with csv file

Answers (1)

Related Questions