Reputation: 755
I have a dataframe (df) and trying to append data to a specific row
Index Fruit Rank
0 banana 1
1 apple 2
2 mango 3
3 Melon 4
The goal is to compare the Fruit at Rank 1 to each rank and then append the value. I'm using difflib.SequenceMatcher to make the comparison. Right now i'm able to append to df but i end up appending the same value to each row. I'm struggling with the loop and append. Any pointers would be much appreciated.
Here is some of my code:
new_entry = df[(df.Rank ==1)]
new_fruit = new_entry['Fruit']
prev_entry = df[(df.Rank ==2)]
prev_fruit = prev_entry['Fruit']
similarity_score = difflib.SequenceMatcher(None, str(new_fruit).lower(), str(prev_fruit).lower()).ratio()
df['similarity_score'] = similarity_score
The result is something like this:
Index Fruit Rank similarity_score
0 banana 1 0.3
1 apple 2 0.3
2 mango 3 0.3
3 Melon 4 0.3
The desired result is:
Index Fruit Rank similarity_score
0 banana 1 n/a
1 apple 2 0.4
2 mango 3 0.5
3 Melon 4 0.6
Thanks.
Upvotes: 0
Views: 196
Reputation: 2917
This doesn't give the similarity score order you want, but it calculates the SequenceMatcher
ratio to the rank 1 value ('banana') and each row and adds it as a column.
import pandas as pd
import difflib
df = pd.DataFrame({'Fruit': ['banana', 'apple', 'mango', 'melon'],
'Rank': [1, 2, 3, 4]})
top = df['Fruit'][df.Rank == 1][0]
df['similarity_score'] = df['Fruit'].apply(lambda x: difflib.SequenceMatcher(
None, top, x).ratio())
Upvotes: 1