How to Remove Extra characters from the column Value using python

Question

I am trying to Map the values from the dictionary, where if the Field values matches with the dictionary it must remove all the extra values from the same. However i can match the things but how i can remove the extra charaters from the column.

Input Data

col_data

Indi8
United states / 08
UNITED Kindom (55)
ITALY 22
israel

Expected Output:

col_data

India
United States
United Kindom
Italy
Israel

Script i am using :

match_val=['India','United Kingdom','Israel','United States','Italy']

lower = [x.lower() for x in match_val]
def nearest(s):
    idx = np.argmax([SequenceMatcher(None, s.lower(), i).ratio() for i in lower])
    return np.array(match_val)[idx]

df['col_data'] = df['col_data'].apply(nearest)

The above script matches the vales with the List, But not able to remove the extra characters from the same. How i can modify the script so that it can remove the extra characters as well after mapping.

How to Remove Extra characters from the column Value using python

Answers (1)

Related Questions