Reputation: 21
I would like to scrape the 'calendar' content in this link: https://gomore.dk/lejebil/27035
I wonder if i could use python scrapy without using selenium to crawl this content. As i cant find any info from the network tab. Thanks!
Upvotes: 0
Views: 116
Reputation: 21
after half day research and i noticed i could use scrapy-splash to retrieve the JS-processed content, which gimme the full content of the webpage, including the calendar information. However, the calendar information is not tally with the expected. e.g. hour 1 for weekday1 should be "danger" but it is not.
The webpage use hour to represent 24 hours each day, and data-weekday 0 - 6 to represent sunday, monday, ..., saturday. And class="danger" to represent calendar is blocked (e.g. red color)
<tr data-hour="0">
<td class="hour">
<div>
<small>00.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="1">
<td class="hour">
<div>
<small>01.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="2">
<td class="hour">
<div>
<small>02.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="3">
<td class="hour">
<div>
<small>03.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="4">
<td class="hour">
<div>
<small>04.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="5">
<td class="hour">
<div>
<small>05.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="6">
<td class="hour">
<div>
<small>06.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="7">
<td class="hour">
<div>
<small>07.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="8">
<td class="hour">
<div>
<small>08.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="9">
<td class="hour">
<div>
<small>09.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="10">
<td class="hour">
<div>
<small>10.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="11">
<td class="hour">
<div>
<small>11.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="12">
<td class="hour">
<div>
<small>12.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4" class="danger"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="13">
<td class="hour">
<div>
<small>13.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2" class="danger"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="14">
<td class="hour">
<div>
<small>14.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2" class="danger"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4"></td>
<td data-weekday="5"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="15">
<td class="hour">
<div>
<small>15.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2" class="danger"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4"></td>
<td data-weekday="5" class="danger"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="16">
<td class="hour">
<div>
<small>16.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2" class="danger"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4"></td>
<td data-weekday="5" class="danger"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="17">
<td class="hour">
<div>
<small>17.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2" class="danger"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4"></td>
<td data-weekday="5" class="danger"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend danger"></td>
</tr>
<tr data-hour="18">
<td class="hour">
<div>
<small>18.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2" class="danger"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4"></td>
<td data-weekday="5" class="danger"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend"></td>
</tr>
<tr data-hour="19">
<td class="hour">
<div>
<small>19.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2" class="danger"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4"></td>
<td data-weekday="5" class="danger"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend"></td>
</tr>
<tr data-hour="20">
<td class="hour">
<div>
<small>20.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2" class="danger"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4"></td>
<td data-weekday="5" class="danger"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend"></td>
</tr>
<tr data-hour="21">
<td class="hour">
<div>
<small>21.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2" class="danger"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4"></td>
<td data-weekday="5" class="danger"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend"></td>
</tr>
<tr data-hour="22">
<td class="hour">
<div>
<small>22.00</small>
</div>
</td>
<td data-weekday="1"></td>
<td data-weekday="2" class="danger"></td>
<td data-weekday="3" class="danger"></td>
<td data-weekday="4"></td>
<td data-weekday="5" class="danger"></td>
<td data-weekday="6" class="cal-weekend danger"></td>
<td data-weekday="0" class="cal-weekend"></td>
</tr>
By any chance the rendered HTML from scrapy-splash can go wrong? The rest of the content seems correct except this calendar table.
Upvotes: 1
Reputation: 21241
https://dgaqgnnkkz5ef.cloudfront.net/assets/application-840c6707422c9d0ee7fb9005972e7c7201803d9c24bbcd23253e6ec7beedd6a1.js this is the JS file where they are getting data from, I dont have mush time to inspect, but you can do more research on how they are doing it search for js-occupancy-calendar
and rental_ad_occupancy_calendar/main
and you will have some idea.
Upvotes: 0