Reputation: 817
I keep on running into walls. Can anybody help me by telling me how to crawl multiple pages from one website using Selenium without having to repeat my code over and over.
Here is my current code:
RegionIDArray = ['de/7132/New-York-City/d687-allthingstodo', 'de/7132/London/d737-allthingstodo']
class Crawling(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Firefox()
self.driver.set_window_size(10, 10)
self.base_url = "http://www.jsox.de/"
self.accept_next_alert = True
def test_sel(self):
driver = self.driver
delay = 3
for reg in RegionIDArray:
page = 0
driver.get(self.base_url + str(reg))
for i in range(1,4):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
If I run this code, I only get the results for London but not the second city New York.
Now, I can do this manually by repeating my code over and over and crawling each individual website page and then concatenating my results for each of these dataframes together but that seems very unpythonic. I was wondering if anyone had a faster way or any advice?
Any feedback is appreciated:)
EDIT
I modified my code according the comment to Anil. Selenium opens the page now for New York and London but it only delivers the results back for London. Any idea, what the reason could be?
Modified code:
RegionIDArray = ['de/7132/New-York-City/d687-allthingstodo', 'de/7132/London/d737-allthingstodo']
class Crawling(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Firefox()
self.driver.set_window_size(10, 10)
self.base_url = "http://www.jsox.de/"
self.accept_next_alert = True
def test_sel(self):
driver = self.driver
delay = 3
for reg in RegionIDArray:
page = 0
driver.get(self.base_url + str(reg))
for i in range(1,4):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
Upvotes: 2
Views: 988
Reputation: 1609
Python loops are controlled by indentations.
for i in range(1,4):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
Upvotes: 1
Reputation: 661
Your for loop
for reg in RegionIDArray:
page = 0
will loop through all list items and when it exits reg
points to the last item i.e., London. That is why you get only the last item
Instead you just need to put the driver
part inside the for loop
def test_sel(self):
driver = self.driver
delay = 3
for reg in RegionIDArray:
page = 0
driver.get(self.base_url + str(reg))
for i in range(1,4):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
Upvotes: 1