Reputation: 4115
I have a little tricky task where i need to find some HTML within a JavaScript variable and traverse it.
The variables look like this:
<script>
var someVar = new something.Something({
content: 'This text has to be found<br /><table></table>',
size: 230
)};
....
</script>
I do not know the name of the JS variable, so it has to be found based on the This text has to be found
snippet/string. Afterwards verified that it is actually a JS variable, then i want to fetch the value <br /><table></table>
in order to traverse it.
Upvotes: 2
Views: 8616
Reputation: 473903
One approach is to make use of a javascript parser, slimit
in this case. The idea is to find all script tags, iterate over them, parse the code, iterate over the syntax tree and check if there is the text you want to find on the right of every assignment node:
from bs4 import BeautifulSoup
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor
data = """
<script>
var someVar = new something.Something({
content: 'This text has to be found<br /><table></table>',
size: 230
});
</script>
"""
text_to_find = 'This text has to be found'
soup = BeautifulSoup(data)
for script in soup.find_all('script'):
parser = Parser()
tree = parser.parse(script.text)
for node in nodevisitor.visit(tree):
if isinstance(node, ast.Assign):
value = getattr(node.right, 'value', '')
if text_to_find in value:
print value
Prints 'This text has to be found<br /><table></table>'
.
I am not sure whether it fits your needs completely, but hope this is at least something to start.
See also:
Upvotes: 5