alphazwest
alphazwest

Reputation: 4450

Locating Lazy Load Elements While Scrolling in PhantomJS in Python

I'm using python and Webdriver to scrape data from a page that dynamically loads content as the user scrolls down the page (lazy load). I have a total of 30 data elements, while only 15 are displayed without first scrolling down.

I am locating my elements, and getting their values in the following way, after scrolling to the bottom of the page multiple times until each element has loaded:

# Get All Data Items
all_data = self.driver.find_elements_by_css_selector('div[some-attribute="some-attribute-value"]')

# Iterate Through Each Item, Get Value
data_value_list = []
for d in all_data:
    # Get Value for Each Data item
    data_value = d.find_element_by_css_selector('div[class="target-class"]').get_attribute('target-attribute')

    #Save Data Value to List
    data_value_list.append(data_value)

When I execute the above code using ChromeDriver, while leaving the browser window up on my screen, I get all 30 data values to populate my data_value_list. When I execute the above code using ChromeDriver, with the window minimized, my list data_value_list is only populated with the initial 15 data values.

The same issue occurs while using PhantomJS, limiting my data_value_list to only the initially-visible data values on the page.

Is there away to load these types of elements while having the browser minimized and, ideally—while utilizing PhantomJS?

NOTE: I'm using an action chain to scroll down using the following approach .send_keys(Keys.PAGE_DOWN).perform() for a calculated number of times.

Upvotes: 0

Views: 1344

Answers (1)

eliotn
eliotn

Reputation: 300

I had the exact same issue. The solution I found was to execute javascript code in the virtual browser to force elements to scroll to the bottom.

Before putting the Javascript command into selenium, I recommend opening up your page in Firefox and inspecting the elements to find the scrollable content. The element should encompass all of the dynamic rows, but it should not include the scrollbar Then, after selecting the element with javascript, you can scroll it to the bottom by setting its scrollTop attribute to its scrollHeight attribute.

Then, you will need to test scrolling the content in the browser. The easiest way to select the element is by ID if the element has an id, but other ways will work. To select an element with the id "scrollableContent" and scroll it to the bottom, execute the following code in your browser's javascript console:

e = document.getElementById('scrollableContent'); e.scrollTop = e.scrollHeight;

Of course, this will only scroll the content to the current top, you will need to repeat this after new content loads if you need to scroll multiple times. Also, I have no way of figuring out how to find the exact element, for me it is trial and error.

This is some code I tried out. However, I feel it can be improved, and should be for applications that are intended to test code or scrape unpredictably. I couldn't figure out how to explicitly wait until more elements were loaded (maybe get the number of elements, scroll to the bottom, then wait for subelement + 1 to show up, and if they don't exit the loop), so I hardcoded 5 scroll events and used time.sleep. time.sleep is ugly and can lead to issues, partly because it depends on the speed of your machine.

def scrollElementToBottom(driver, element_id):
 time.sleep(.2)
 for i in range(5):
   driver.execute_script("e = document.getElementById('" + element_id + "'); e.scrollTop = e.scrollHeight;")
   time.sleep(.2)

The caveat is that the following solution worked with the Firefox driver, but I see no reason why it shouldn't work with your setup.

Upvotes: 1

Related Questions