Reputation: 5097
I have a dataframe df which looks like:
id colour response
1 blue curent
2 red loaning
3 yellow current
4 green loan
5 red currret
6 green loan
You can see the values in the response column are not uniform and I would like to get the to snap to a standardized set of responses.
I also have a validation list validate
which looks like
validate
current
loan
transfer
I would like to standardise the response column in the df based on the first three characters in the entry against the validate list
So the eventual output would look like:
id colour response
1 blue current
2 red loan
3 yellow current
4 green loan
5 red current
6 green loan
have tried to use fnmatch
pattern = 'cur*'
fnmatch.filter(df, pattern) = 'current'
but can't change the values in the df.
If anyone could offer assistance it would be appreciated
Thanks
Upvotes: 0
Views: 323
Reputation: 323226
Fuzzy match ?
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
a=[]
for x in df.response:
a.append([process.extract(x, val.validate, limit=1)][0][0][0])
df['response2']=a
df
Out[867]:
id colour response response2
0 1 blue curent current
1 2 red loaning loan
2 3 yellow current current
3 4 green loan loan
4 5 red currret current
5 6 green loan loan
Upvotes: 0
Reputation: 76917
You could use map
In [3664]: mapping = dict(zip(s.str[:3], s))
In [3665]: df.response.str[:3].map(mapping)
Out[3665]:
0 current
1 loan
2 current
3 loan
4 current
5 loan
Name: response, dtype: object
In [3666]: df['response2'] = df.response.str[:3].map(mapping)
In [3667]: df
Out[3667]:
id colour response response2
0 1 blue curent current
1 2 red loaning loan
2 3 yellow current current
3 4 green loan loan
4 5 red currret current
5 6 green loan loan
Where s
is series of validation values.
In [3650]: s
Out[3650]:
0 current
1 loan
2 transfer
Name: validate, dtype: object
Details
In [3652]: mapping
Out[3652]: {'cur': 'current', 'loa': 'loan', 'tra': 'transfer'}
mapping
can be series too
In [3678]: pd.Series(s.str[:3].values, index=s.values)
Out[3678]:
current cur
loan loa
transfer tra
dtype: object
Upvotes: 2