jerry9855
jerry9855

Reputation: 3

How to use Beautiful Soup to extract function string in <script> tag?

In a given .html page, I have a script tag like this: How can I use beautiful soup to extract the "retrun" information under "function getData()" ?

<script>
function getData()
{
	return "zip,city,state,MedianIncome,MedianIncomeRank,CostOfLivingIndex,CostOfLivingRank\n10452,Bronx,NY,20606,2,147.7,74";
}

function getResultsCount()
{
	return "1";
}

</script>

Upvotes: 0

Views: 2740

Answers (1)

alecxe
alecxe

Reputation: 473763

One way, arguably the simplest, is to use a regular expression to both locate the element and to extract the desired string:

import re

from bs4 import BeautifulSoup

data = """
<script>
function getData()
{
    return "zip,city,state,MedianIncome,MedianIncomeRank,CostOfLivingIndex,CostOfLivingRank\n10452,Bronx,NY,20606,2,147.7,74";
}

function getResultsCount()
{
    return "1";
}

</script>
"""

soup = BeautifulSoup(data, "html.parser")

pattern = re.compile(r'return "(.*?)";$', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)

print(pattern.search(script.text).group(1))

Prints:

zip,city,state,MedianIncome,MedianIncomeRank,CostOfLivingIndex,CostOfLivingRank
10452,Bronx,NY,20606,2,147.7,74

Or, you can also use a JavaScript parser, like slimit, example here.

Upvotes: 1

Related Questions