Loop Through DataFrame to Select Specific Cells in Python

Question

I am attempting to create a loop function to loop through my dataframe in python in order to compare text documents for a count vectorizer method and other similar comparison functions.

I have data of movie franchises and want to compare the plot of each sequel to the original film in the franchise, as well as the previous film in the franchise. I have attached a snippet of the data. For example, I want Seq 1 in FranID 1 to be compared to Seq 0 in FranID 1 and have this continue for each sequel and franchise. I would want Seq 2,3,4,5,etc. to be compared to Seq 0 within each FranID.

In addition, I would want a separate loop that compared each sequel to the previous film within each franchise. For example, I want to compare Seq 1 to Seq 0 and Seq 2 to Seq 1, etc.

Is there a way I can loop through the data in such for to implement it into this code or similar and then add it to the dataframe as a new variable for each film:

def cosine_distance_countvectorizer_method(s1, s2):

# sentences to list
allsentences = [s1 , s2]

# packages
from sklearn.feature_extraction.text import CountVectorizer
from scipy.spatial import distance

# text to vector
vectorizer = CountVectorizer()
all_sentences_to_vector = vectorizer.fit_transform(allsentences)
text_to_vector_v1 = all_sentences_to_vector.toarray()[0].tolist()
text_to_vector_v2 = all_sentences_to_vector.toarray()[1].tolist()

# distance of similarity
cosine = distance.cosine(text_to_vector_v1, text_to_vector_v2)
print('Similarity of two sentences are equal to ',round((1-cosine)*100,2),'%')
return cosine

Next line:

cosine_distance_countvectorizer_method(ss1 , ss2)

Data example:

Loop Through DataFrame to Select Specific Cells in Python

Answers (1)

Related Questions