Reputation: 693
I'm trying to strip out the numbers and the dates from a chart on Zillow. The url is: https://www.zillow.com/austin-tx/home-values/
The area in the html I'm working with is:
<ul class="legend-entries" id="yui_3_18_1_1_1607476788112_1009">
<li class="legend-value">Oct 2021</li>
<li class="legend-entry legend-entry-0" id="yui_3_18_1_1_1607476788112_1330">Austin $464K</li>
<li class="hide legend-entry legend-entry-1"></li>
<li class="hide legend-entry legend-entry-2"></li>
<li class="hide legend-entry legend-entry-3"></li>
<li class="hide legend-entry legend-entry-4"></li>
<li class="hide legend-entry legend-entry-5"></li>
<li class="hide legend-entry legend-entry-6"></li>
</ul>
I'm trying to parse out the legend-value
(Oct 2021) and the legend-entry
($464K) text. However, when you mouse over the points on the chart (where this data exists on the page), the values in the html change whenever you move the mouse.
Here's my code so far:
from bs4 import BeautifulSoup
req_headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.8',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}
all_data = []
url = 'https://www.zillow.com/austin-tx/home-values/'
r = s.get(url, headers=req_headers)
soup = BeautifulSoup(r.content, 'html.parser')
#soup.find (class_= 'legend-entries')
for ul in soup.find_all('ul'):
lis=ul.find_all('li')
for elem in lis:
all_data.append(elem.text.strip())
I feel like this should work, but it returns nothing. The hashed line in my code will return the legend-entries
tag at least. I am not sure how to achieve this.
Upvotes: 0
Views: 103
Reputation: 20022
That graph comes from an API call. You can fetch that and rebuild the data.
Here's how:
from datetime import datetime
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:83.0) Gecko/20100101 Firefox/83.0",
"X-Requested-With": "XMLHttpRequest",
}
api_url = "https://www.zillow.com/ajax/homevalues/data/timeseries.json?r=10221&m=zhvi_plus_forecast&dt=111"
graph = requests.get(api_url, headers=headers).json()
time_ = graph["10221;zhvi_plus_forecast;111"]["data"]
for moment in time_:
date = datetime.fromtimestamp(moment["x"] // 1000).date()
value = moment["y"]
print(f"{date} - ${value}")
Output:
2010-12-31 - $224771
2011-01-31 - $224297
2011-02-28 - $223623
2011-03-31 - $223053
2011-04-30 - $222571
2011-05-31 - $221931
2011-06-30 - $221322
2011-07-31 - $220837
2011-08-31 - $221413
2011-09-30 - $222088
2011-10-31 - $222520
2011-11-30 - $222665
2011-12-31 - $222788
2012-01-31 - $223433
2012-02-29 - $224288
2012-03-31 - $225461
and so on ...
Or, you can plot that and have your own graph (who says you can't, right?).
from datetime import datetime
import matplotlib.pyplot as plt
import pandas as pd
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:83.0) Gecko/20100101 Firefox/83.0",
"X-Requested-With": "XMLHttpRequest",
}
api_url = "https://www.zillow.com/ajax/homevalues/data/timeseries.json?r=10221&m=zhvi_plus_forecast&dt=111"
graph = requests.get(api_url, headers=headers).json()
df = pd.DataFrame(graph["10221;zhvi_plus_forecast;111"]["data"])
plt.figure(1)
plt.plot(df['x'].apply(lambda x: datetime.fromtimestamp(x // 1000).date()), df['y'])
plt.show()
Output:
Upvotes: 2