huahz
huahz

Reputation: 21

Scrape the dynamic content using python scrapy

I would like to scrape the 'calendar' content in this link: https://gomore.dk/lejebil/27035

Calendar information i want

I wonder if i could use python scrapy without using selenium to crawl this content. As i cant find any info from the network tab. Thanks!

Upvotes: 0

Views: 116

Answers (2)

huahz
huahz

Reputation: 21

after half day research and i noticed i could use scrapy-splash to retrieve the JS-processed content, which gimme the full content of the webpage, including the calendar information. However, the calendar information is not tally with the expected. e.g. hour 1 for weekday1 should be "danger" but it is not.

The webpage use hour to represent 24 hours each day, and data-weekday 0 - 6 to represent sunday, monday, ..., saturday. And class="danger" to represent calendar is blocked (e.g. red color)

   <tr data-hour="0">
      <td class="hour">
        <div>
          <small>00.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="1">
      <td class="hour">
        <div>
          <small>01.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="2">
      <td class="hour">
        <div>
          <small>02.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="3">
      <td class="hour">
        <div>
          <small>03.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="4">
      <td class="hour">
        <div>
          <small>04.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="5">
      <td class="hour">
        <div>
          <small>05.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="6">
      <td class="hour">
        <div>
          <small>06.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="7">
      <td class="hour">
        <div>
          <small>07.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="8">
      <td class="hour">
        <div>
          <small>08.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="9">
      <td class="hour">
        <div>
          <small>09.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="10">
      <td class="hour">
        <div>
          <small>10.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="11">
      <td class="hour">
        <div>
          <small>11.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="12">
      <td class="hour">
        <div>
          <small>12.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="13">
      <td class="hour">
        <div>
          <small>13.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="14">
      <td class="hour">
        <div>
          <small>14.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="15">
      <td class="hour">
        <div>
          <small>15.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="16">
      <td class="hour">
        <div>
          <small>16.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="17">
      <td class="hour">
        <div>
          <small>17.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="18">
      <td class="hour">
        <div>
          <small>18.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

    <tr data-hour="19">
      <td class="hour">
        <div>
          <small>19.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

    <tr data-hour="20">
      <td class="hour">
        <div>
          <small>20.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

    <tr data-hour="21">
      <td class="hour">
        <div>
          <small>21.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

    <tr data-hour="22">
      <td class="hour">
        <div>
          <small>22.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

By any chance the rendered HTML from scrapy-splash can go wrong? The rest of the content seems correct except this calendar table.

Upvotes: 1

Umair Ayub
Umair Ayub

Reputation: 21241

https://dgaqgnnkkz5ef.cloudfront.net/assets/application-840c6707422c9d0ee7fb9005972e7c7201803d9c24bbcd23253e6ec7beedd6a1.js this is the JS file where they are getting data from, I dont have mush time to inspect, but you can do more research on how they are doing it search for js-occupancy-calendar and rental_ad_occupancy_calendar/main and you will have some idea.

Upvotes: 0

Related Questions