Reputation: 87
I am trying to parse latex code from html code which looks like this:
string = " your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "
I want to replace all latex code with the output of a function that takes the latex code as an argument (Since there is a problem with finding the correct pattern, the function extract
returns an empty string for the moment).
I tried:
latex_end = "\)"
latex_start = "\("
string = re.sub(r'{}.*?{}'.format(latex_start, latex_end), extract, string)
Result:
your answer is wrong! Solution: based on \= 0 \) and \=0\) beeing ...
Expected:
your answer is wrong! Solution: based on and beeing ...
Any idea why it does not find the pattern? Is there a way to implement it?
Upvotes: 1
Views: 144
Reputation: 44128
You should use a raw string for your definition of string
since \v
is being interpreted as a special character.
import re
string = r" your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "
string = re.sub(r'\\\(.*?\\\)', '', string))
print(string)
Prints:
your answer is wrong! Solution: based on and beeing ...
If you need to have variables for the start and end:
latex_end = r"\\\)"
latex_start = r"\\\("
string = re.sub(r'{}.*?{}'.format(latex_start, latex_end), '', string)
print(string)
Upvotes: 1
Reputation: 642
This is because of backslashes serving as escape characters in Python. This makes handling these kinds of situations very tricky. The following are two quick ways of making this work:
import re
extract = lambda a: ""
# Using no raw components
string = " your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "
latex_bounds = ("\\\(", "\\\)\)")
print(re.sub('{}.*?{}'.format(*latex_bounds), extract, string))
# Using all raw components (backslashes mean nothing, but not really)
string = r"%s" % string
latex_bounds = (r"\\\(", r"\\\)")
print(re.sub(r'{}.*?{}'.format(*latex_bounds), extract, string))
Upvotes: 1