Serious Ruffy
Serious Ruffy

Reputation: 817

Crawl Multiple pages of a website with using Selenium (Python3)

I keep on running into walls. Can anybody help me by telling me how to crawl multiple pages from one website using Selenium without having to repeat my code over and over.

Here is my current code:

RegionIDArray = ['de/7132/New-York-City/d687-allthingstodo',  'de/7132/London/d737-allthingstodo']

class Crawling(unittest.TestCase):
 def setUp(self):
     self.driver = webdriver.Firefox()
     self.driver.set_window_size(10, 10)
     self.base_url = "http://www.jsox.de/"
     self.accept_next_alert = True


 def test_sel(self):
     driver = self.driver
     delay = 3
     for reg in RegionIDArray:
        page = 0
     driver.get(self.base_url + str(reg))
     for i in range(1,4):
         driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
         time.sleep(2)

If I run this code, I only get the results for London but not the second city New York.

Now, I can do this manually by repeating my code over and over and crawling each individual website page and then concatenating my results for each of these dataframes together but that seems very unpythonic. I was wondering if anyone had a faster way or any advice?

Any feedback is appreciated:)

EDIT

I modified my code according the comment to Anil. Selenium opens the page now for New York and London but it only delivers the results back for London. Any idea, what the reason could be?

Modified code:

 RegionIDArray = ['de/7132/New-York-City/d687-allthingstodo', 'de/7132/London/d737-allthingstodo']


 class Crawling(unittest.TestCase):
     def setUp(self):
         self.driver = webdriver.Firefox()
         self.driver.set_window_size(10, 10)
         self.base_url = "http://www.jsox.de/"
         self.accept_next_alert = True


     def test_sel(self):
         driver = self.driver
         delay = 3
         for reg in RegionIDArray:
             page = 0
             driver.get(self.base_url + str(reg))
             for i in range(1,4):
             driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
             time.sleep(2)

Upvotes: 2

Views: 988

Answers (2)

Shamik
Shamik

Reputation: 1609

Python loops are controlled by indentations.

for i in range(1,4):
             driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
             time.sleep(2)

Upvotes: 1

Termi
Termi

Reputation: 661

Your for loop

for reg in RegionIDArray:
    page = 0

will loop through all list items and when it exits reg points to the last item i.e., London. That is why you get only the last item

Instead you just need to put the driver part inside the for loop

def test_sel(self):
     driver = self.driver
     delay = 3
     for reg in RegionIDArray:
         page = 0
         driver.get(self.base_url + str(reg))
         for i in range(1,4):
             driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
             time.sleep(2)

Upvotes: 1

Related Questions