JavaCake
JavaCake

Reputation: 4115

Finding JavaScript variable with certain string with BeautifulSoup

I have a little tricky task where i need to find some HTML within a JavaScript variable and traverse it.

The variables look like this:

<script>
var someVar = new something.Something({
    content: 'This text has to be found<br /><table></table>',
    size: 230
)};
....
</script>

I do not know the name of the JS variable, so it has to be found based on the This text has to be found snippet/string. Afterwards verified that it is actually a JS variable, then i want to fetch the value <br /><table></table> in order to traverse it.

Upvotes: 2

Views: 8616

Answers (1)

alecxe
alecxe

Reputation: 473903

One approach is to make use of a javascript parser, slimit in this case. The idea is to find all script tags, iterate over them, parse the code, iterate over the syntax tree and check if there is the text you want to find on the right of every assignment node:

from bs4 import BeautifulSoup
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor

data = """
<script>
var someVar = new something.Something({
    content: 'This text has to be found<br /><table></table>',
    size: 230
});
</script>
"""
text_to_find = 'This text has to be found'

soup = BeautifulSoup(data)

for script in soup.find_all('script'):
    parser = Parser()
    tree = parser.parse(script.text)
    for node in nodevisitor.visit(tree):
        if isinstance(node, ast.Assign):
            value = getattr(node.right, 'value', '')
            if text_to_find in value:
                print value

Prints 'This text has to be found<br /><table></table>'.

I am not sure whether it fits your needs completely, but hope this is at least something to start.

See also:

Upvotes: 5

Related Questions