BlackHat
BlackHat

Reputation: 755

Python dataframes

I have a dataframe (df) and trying to append data to a specific row

Index Fruit Rank 0 banana 1 1 apple 2 2 mango 3 3 Melon 4

The goal is to compare the Fruit at Rank 1 to each rank and then append the value. I'm using difflib.SequenceMatcher to make the comparison. Right now i'm able to append to df but i end up appending the same value to each row. I'm struggling with the loop and append. Any pointers would be much appreciated.

Here is some of my code:

new_entry = df[(df.Rank ==1)]
new_fruit = new_entry['Fruit']

prev_entry = df[(df.Rank ==2)]
prev_fruit = prev_entry['Fruit']


similarity_score = difflib.SequenceMatcher(None, str(new_fruit).lower(), str(prev_fruit).lower()).ratio()

df['similarity_score'] = similarity_score

The result is something like this:

Index Fruit Rank similarity_score 0 banana 1 0.3 1 apple 2 0.3 2 mango 3 0.3 3 Melon 4 0.3

The desired result is:

Index Fruit Rank similarity_score 0 banana 1 n/a 1 apple 2 0.4 2 mango 3 0.5 3 Melon 4 0.6

Thanks.

Upvotes: 0

Views: 196

Answers (1)

bananafish
bananafish

Reputation: 2917

This doesn't give the similarity score order you want, but it calculates the SequenceMatcher ratio to the rank 1 value ('banana') and each row and adds it as a column.

import pandas as pd
import difflib

df = pd.DataFrame({'Fruit': ['banana', 'apple', 'mango', 'melon'],
                   'Rank': [1, 2, 3, 4]})

top = df['Fruit'][df.Rank == 1][0]
df['similarity_score'] = df['Fruit'].apply(lambda x: difflib.SequenceMatcher(
                                           None, top, x).ratio())

Upvotes: 1

Related Questions