Reputation: 21

How can build a model that will predict duplicate records in your dataset?

What are the algorithms that will predict duplicates in your dataset. For example -

Name Marks
A    100
B     90
C     80
A    100

I need something like this -

Name  Marks S/D
A     100   Single
B     90    Single
C     80    Single
A     100   Duplicate

I'm looking for some algorithms that can help in this case.

Upvotes: 2

Answers (1)

I'mahdi

Reputation: 24049

IIUC, you need this:

import pandas as pd
df = pd.DataFrame({'Name':['A','B','C','A'],'Marks': [100, 90, 80, 100]})

df['res'] = df.duplicated().map({False:"Single", True:"Duplicated"})

Output:

>>> df
  Name  Marks         res
0    A    100      Single
1    B     90      Single
2    C     80      Single
3    A    100  Duplicated

Upvotes: 1

How can build a model that will predict duplicate records in your dataset?

Answers (1)

Related Questions