user11555536
user11555536

Reputation:

Remove similar character string duplicates from a dataframe

I have df which currently looks something like this:

Car Name      Number
Adam Leaf     9
Adamm Leaf    9
Adam Lea      NaN
Adam-Leaf     NaN
Adam/Leaf     9
Claire-Green  NaN
Cliare Green  3
Claire Green  3
Claire Gren   NaN
Claire/Green  3

I am trying to remove the variations to achieve something like this

Car Name      Number
Adam Leaf     9
Claire Green  3

Upvotes: 1

Views: 160

Answers (1)

BENY
BENY

Reputation: 323226

here is one way from jellyfish

import jellyfish

s=df.groupby(df['Car Name'].apply(jellyfish.soundex)).first()
              Car Name  Number
Car Name                      
A354         Adam Leaf     9.0
C462      Claire-Green     3.0

Upvotes: 3

Related Questions