Reputation: 23
I am trying to extract only the rented prices (green dots) from this site but I can't find where the data is coming from. I want to use beautifulsoup or scrapy to do the web scraping. Is the data being imported as a JSON or how is it appearing on the website? Apologizes for such a broad question I am relatively new to python programming language. There are URLs in source code that may lead to the data but I can't figure it out. Any push in the right direction would be so appreciated.
Here is the website: https://www.redweek.com/whats-my-timeshare-worth/P5035-wyndham-bonnet-creek-resort/rental-historical
Upvotes: 2
Views: 467
Reputation: 28565
Hamza said where to find it. But re-iterate, when you are on the site, right-click and select "Inspect" (or Ctrl-Shift-I). In the right pannel you'll find it in Network, XHR, Headers (you may need to reload the page once you have the panel open)
Here's the code to turn that json to a table:
import pandas as pd
import requests
url = 'https://www.redweek.com/whats-my-timeshare-worth/xhr?resort_id=5035&type=rental&active=0'
headers = {'User-Agent': 'Mozilla/5.0'}
jsonData = requests.get(url, headers=headers).json()
cols = {}
for idx, each in enumerate(jsonData['cols']):
cols.update({idx:each['label']})
cols.update({0:'Week'})
rows = []
for row in jsonData['rows']:
temp_row = {}
for idx, each in enumerate(row['c']):
w=1
temp_row.update({cols[idx]:each['v']})
rows.append(temp_row)
df = pd.DataFrame(rows)
df['Price'] = df['Rented'].fillna(df['Unknown'])
df = df.drop(['Not Rented','Active posting','Rented','Unknown'],axis=1)
Output:
print(df)
Bedrooms Status Week Price
0 1 Unknown 52 120.0
1 1 Unknown 52 130.0
2 1 Unknown 53 120.0
3 1 Unknown 1 60.0
4 1 Unknown 3 140.0
5 1 Unknown 5 100.0
6 1 Unknown 11 170.0
7 1 Unknown 11 90.0
8 1 Unknown 20 90.0
9 1 Unknown 22 130.0
10 1 Unknown 23 100.0
11 1 Unknown 24 100.0
12 1 Unknown 24 180.0
13 1 Unknown 25 100.0
14 1 Unknown 27 90.0
15 1 Unknown 28 90.0
16 1 Unknown 29 90.0
17 1 Unknown 30 90.0
18 1 Unknown 47 100.0
19 1 Unknown 52 100.0
20 1 Unknown 1 140.0
21 1 Unknown 10 140.0
22 1 Unknown 12 130.0
23 1 Unknown 14 100.0
24 1 Unknown 14 160.0
25 1 Unknown 26 110.0
26 1 Unknown 34 90.0
27 1 Unknown 39 140.0
28 1 Unknown 43 160.0
29 1 Unknown 51 100.0
... ... ... ...
4035 3 Rented 12 250.0
4036 3 Rented 13 270.0
4037 3 Rented 14 230.0
4038 3 Rented 18 280.0
4039 3 Rented 27 180.0
4040 3 Rented 35 90.0
4041 4 Rented 53 330.0
4042 4 Rented 15 170.0
4043 4 Rented 14 310.0
4044 4 Rented 18 250.0
4045 4 Rented 19 250.0
4046 4 Rented 46 300.0
4047 4 Rented 18 250.0
4048 4 Rented 19 200.0
4049 4 Rented 8 190.0
4050 4 Rented 12 240.0
4051 4 Rented 27 200.0
4052 4 Rented 7 240.0
4053 4 Rented 18 200.0
4054 4 Rented 47 310.0
4055 4 Rented 45 210.0
4056 4 Rented 7 320.0
4057 4 Rented 51 320.0
4058 4 Rented 9 300.0
4059 4 Rented 15 220.0
4060 4 Rented 39 210.0
4061 4 Rented 41 280.0
4062 4 Rented 4 200.0
4063 4 Rented 5 260.0
4064 4 Rented 35 130.0
[4065 rows x 4 columns]
Upvotes: 1
Reputation: 137
I am only going to help you find the place where the data is coming from. Parsing the JSON is up to you. If you open up the Network Tab in Chrome Developer's Console, you can see:
xhr?resort_id=5035&type=rental&active=0
Now, when you click on that, you will get the Request URL option on the right hand side. This is where the data is coming from:
https://redweek.com/whats-my-timeshare-worth/xhr?resort_id=5035&type=rental&active=0
Upvotes: 3