Reputation:
I have extended this SO question & is comparing two latex equations. Here is two quadratic equation's example.
eqn1 = "*=\frac{-*\pm\sqrt{*^2-4ac}}{2a}"
eqn2 = "x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}"
I need to compare these as correct, because, instead of x, b, I have use * for that. All I am doing is converting equations to word list.
eqn1_word = [*,frac,*,pm,sqrt,*,2,4ac,2a]
eqn2_word = [x,frac,b,pm, sqrt, b, 2, 4ac, 2a]
so the vector is
eqn1_vec= Counter({'*': 3, 'frac': 1, 'sqrt': 1, '2': 1, '2a': 1, '4ac': 1, 'pm': 1})
eqn2_vec = Counter({'b': 2, 'frac': 1, 'sqrt': 1, '2': 1, '2a': 1, '4ac': 1, 'x': 1, 'pm': 1})
Now my extension is I am checking the percentage of * in eqn1_word, then check with normal cosine similarity as given by that answer. At last, I am adding two values, which has to nearly equal to 1.
This works fine for most of scenario(if one variable is replaced by *). Here is * value is 3 for eqn1_vec, and in eqn2_vec b = 2, x=1.
For more description & better understanding please check this. From that reference, my code is like this.
def get_cosine(self, c_eqn1_eqn, c_eqn2_eqn):
print 'c_eqn1_eqn = ', c_eqn1_eqn
print 'c_eqn2_eqn = ', c_eqn2_eqn
_special_symbol = float(c_eqn1_eqn.count("*"))
cos_result = 0
symbol_percentage = 0
try:
eqn1_vector = Counter(self.get_word(c_eqn1_eqn))# get word will return word list
eqn2_vector = Counter(self.get_word(c_eqn2_eqn))
_words = sum([x for x in eqn1_vector.values()])
if eqn2_vector.has_key("*"):
_special_symbol -= eqn2_vector["*"]
print '_special_symbol = ', _special_symbol
print '_words @ last = ', _words
try:
symbol_percentage = _special_symbol / _words
except ZeroDivisionError:
symbol_percentage = 0.0
except Exception as exp:
print "Exception at converting equation to vector", exp
traceback.print_exc()
else:
intersection = set(eqn1_vector.keys()) & set(eqn2_vector.keys())
numerator = sum([eqn1_vector[x] * eqn2_vector[x] for x in intersection])
_sum1 = sum([eqn1_vector[x]**2 for x in eqn1_vector.keys()])
_sum2 = sum([eqn2_vector[x]**2 for x in eqn2_vector.keys()])
denominator = math.sqrt(_sum1) * math.sqrt(_sum2)
print 'numerator = ', numerator
print 'denominator = ', denominator
if not denominator:
cos_result = 0
else:
cos_result = float(numerator) / denominator
print cos_result
final_result = float(symbol_percentage) + cos_result
return final_result if final_result <= 1.0 else 1
The problem is numerator is getting small as intersection value is small. I have copied from my class. please ignore self.
How to solve this. Thanks in advance. If there is any mistake in question or my concept is wrong, please share with me.
Upvotes: 1
Views: 4838
Reputation:
I got a solution for this problem.
As we can/should not increase numerator value, I decided to handle denominator instead. My logic is to decrease the denominator value if number of * and number of non intersecting value in eqn2 is same. If not then let it go as it is. Now I do not have to calculate the percentage for "*" nor adding that in cosine result.
def get_cosine(c_eqn1, c_eqn2):
_special_symbol = float(c_eqn1.count("*"))
cos_result = 0
try:
eqn1_vector = Counter(get_word(c_eqn1))
eqn2_vector = Counter(get_word(c_eqn2))
_special_symbol = 0
spe_list = list()
# Storing number of * & the value contains *
for _val in eqn1_vector.keys():
if _val.__contains__("*"):
_special_symbol += eqn1_vector[_val]
spe_list.append(_val)
if eqn2_vector.has_key("*"):
_special_symbol -= eqn2_vector["*"]
except Exception as exp:
print "Exception at converting equation to vector", exp
traceback.print_exc()
else:
intersection = set(eqn1_vector.keys()) & set(eqn2_vector.keys())
numerator = sum([eqn1_vector[x] * eqn2_vector[x]
for x in intersection])
non_intersection_sum = 0
non_intersection_value = list()
# storing no of non_matched value
for _val in eqn2_vector.keys():
if _val not in intersection:
non_intersection_sum += eqn2_vector[_val]
non_intersection_value.append(_val)
# Join both non intercet lists
if non_intersection_value:
non_intersection_value.extend(spe_list)
# If both non intersect value are not same
# Empty the list
if _special_symbol != non_intersection_sum:
non_intersection_value = list()
# Cosine similarity formula
_sum1 = sum([eqn1_vector[x]**2 for x in eqn1_vector.keys() if x not in non_intersection_value])
_sum2 = sum([eqn2_vector[x]**2 for x in eqn2_vector.keys() if x not in non_intersection_value])
denominator = math.sqrt(_sum1) * math.sqrt(_sum2)
if not denominator:
cos_result = 0
else:
cos_result = float(numerator) / denominator
return cos_result if cos_result <= 1.0 else 1
Upvotes: 1