Reputation: 3
In website page, How can I use beautiful soup to extract the "return" information under "function getData() in html source code" ? I got error like this :
print(pattern.search(script.text).group(1)) AttributeError: 'NoneType' object has no attribute 'text'
import os, sys, urllib, urllib2
from urllib2 import urlopen, Request
from bs4 import BeautifulSoup
url = "http://zipwho.com/?zip=91709&city=&filters=--_--_--_--&state=&mode=zip"
data = urlopen(url).read()
soup = BeautifulSoup(data, "html.parser")
pattern = re.compile(r'return "(.*?)";$', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print(pattern.search(script.text).group(1))
Upvotes: 0
Views: 1724
Reputation: 97
Tried it on my computer (with requests, not urllib2) and got this
print(script)
>>> None
This is why you get the
AttributeError: 'NoneType' object has no attribute 'text'
Im not sure what your regex is trying to achieve but check it again. Maybe test it on the string which u expect to get first
edit: try this
url = "http://zipwho.com/?zip=91709&city=&filters=--_--_--_--&state=&mode=zip"
data = urlopen(url).read()
soup = BeautifulSoup(data, "html.parser")
script = soup.find("script")
print(script.text)
the output:
function getData()
{
return "zip,city,state,MedianIncome,MedianIncomeRank,CostOfLivingIndex,CostOfLivingRank,MedianMortgageToIncomeRatio,MedianMortgageToIncomeRank,OwnerOccupiedHomesPercent,OwnerOccupiedHomesRank,MedianRoomsInHome,MedianRoomsInHomeRank,CollegeDegreePercent,CollegeDegreeRank,ProfessionalPercent,ProfessionalRank,Population,PopulationRank,AverageHouseholdSize,AverageHouseholdSizeRank,MedianAge,MedianAgeRank,MaleToFemaleRatio,MaleToFemaleRank,MarriedPercent,MarriedRank,DivorcedPercent,DivorcedRank,WhitePercent,WhiteRank,BlackPercent,BlackRank,AsianPercent,AsianRank,HispanicEthnicityPercent,HispanicEthnicityRank\n91709,Chino Hills,CA,78336,96,260.8,93,25.6,92,84.9,81,6.4,90,37.5,87,44.9,88,66693,99,3.3,96,32.3,13,93.6,57,66.9,83,6.3,11,43.7,10,5.4,68,21.0,98,25.6,92";
}
function getResultsCount()
{
return "1";
}
its a string
type(script.text)
>>><class 'str'>
so now you can easily match a regex against it to get the result you want
my code
import requests
from bs4 import BeautifulSoup
url = "http://zipwho.com/?zip=91709&city=&filters=--_--_--_--&state=&mode=zip"
data = requests.get(url)
soup = BeautifulSoup(data.content, "html.parser")
script = soup.find('script')
print(script.text)
notice that im using requests instad of urllib2 (go ahead and try it)
Upvotes: 2