Reputation: 1071
sample data
sample = ({'name':['Delinquency Rate','Cumulative Probability'],
'value':['Dlnqy','Prbblty'],
'new_name':['Dlnqncy Rt','Cmltv Prbblty']})
test = pd.DataFrame(sample)
test
new_name is created by removing all the vowels from 'name'. I want to compare first 3 characters of 'value' and first 3 character of 'each of the word' in 'new_name' and if they match I want to return the value from the name column. For example 'Dln' in value exists in 'Dlnqncy' of new_name, so we will return 'Delinquency' from name. The logic should work like
'if value[:3] in new_name[:3] then return name'
Following is what I have so far, which works fine if I have only two columns. But it doesn't work if I want to compare value and new_name and retun name.
def get_matches(name, value, new_name, default=''):
return next( (word for word in new_name.split() if str(value)[:3] in word[:3]),default)
test['match'] = test[['name', 'value', 'new_name']].apply(lambda row: get_matches(*row, default=' '), axis=1)
In the following table the column 'match' should have 'Delinquency' and 'Probability' (as it is there in the 'name' column)
Upvotes: 0
Views: 382
Reputation: 355
Here you go:
(test
.assign(match=lambda x: np.where((x.new_name.str[:3] == x.value.str[:3]), x.name, x.value))
)
Upvotes: 1