GigaByte
GigaByte

Reputation: 700

How can i navigate through pagination using selenium python?

I am working on website automation and I want to navigate through different pages and problem is that the website is developed using Angular I think. The Pagination part is having a js function also which is called on an onClick function.

HTML Code is:

<li ng-if="directionLinks" ng-class="{ disabled : pagination.current == pagination.last }" class="ng-scope"><a href="" ng-click="setCurrent(pagination.current + 1)" class="xh-highlight">›</a></li>

Edited:

Website Link: https://jobee.pk/jobs-in-pakistan

Code Tried so far:

from selenium import webdriver
import time
class JobeePK:
    def __init__(self):
        # self.url = ""
        pass
    def driver(self):
        driver = webdriver.Chrome()
        driver.maximize_window()
        time.sleep(1)
        return driver

    # https://www.rozee.pk/job/jsearch/q/all/fc/1185/fpn/
    def extractData(self,search_link, total_pages):
        driver = self.driver()
        driver.get(search_link)
        time.sleep(5)

        for page_number in range(0, total_pages):
            driver.find_element_by_css_selector()
            time.sleep(10)



if __name__ == '__main__':
    jb = JobeePK()
    url = "https://jobee.pk/jobs-in-pakistan"
    total_pages = 128
    jb.extractData(url, total_pages)

Please suggest me any solution to tackle this problem. Thanks

Upvotes: 0

Views: 405

Answers (1)

Sebastien D
Sebastien D

Reputation: 4482

In such cases, it is always interesting to have a closer look to the page to understand how the data is actually updated.

I did so opening the console in Firefox and having a look at the XHR traffic network.

enter image description here

... interesting. The page is getting its results from an endpoint we could identify.

It returns json data which is great:

{'totalJobs': 2541,
 'jobs': [{'location': [{'jobLocationID': 0,
     'jobID': 24986,
     'countryID': 0,
     'country': 'Pakistan',
     'cityID': None,
     'cityText': 'Karachi',
     'jobShiftID': 0,
     'name': None}],
   'jobID': 24986,
   'jobIDEncrypted': '26cfb27ee6b2abad',
   'title': 'Marketing Officer - Freelancer',
   'jobDescription': '<p>We are growing, energetic, and highly-reputed Public Relation (PR) and Digital Marketing Agency.<br />\nCurrently, we are looking for ...

Lets use this to write our script:

import requests
import math

#The scrapping function
def getJobs(pageNumber):

    #Defining the headers
    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0',
        'Accept': 'application/json, text/plain, */*',
        'Accept-Language': 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3',
        'X-Requested-With': 'XMLHttpRequest',
        'Content-Type': 'application/json;charset=utf-8',
        'DNT': '1',
        'Connection': 'keep-alive',
        'Referer': 'https://jobee.pk/jobs-in-pakistan',
        'Pragma': 'no-cache'      
    }

    #Setting the right params for the request we will make, pageSize is set to 200 (results by page)
    data = {"model":{"titles":[],"cities":[],"shifts":[],"experinces":[],"careerLevels":[],"functionalAreas":[],"genders":[],"industries":[],"degreeLevels":[],"companies":[]},"pageNumber":1,"pageSize":200}

    #Updating the page number
    data['pageNumber'] = pageNumber
    data = json.dumps(data)

    #Collecting the results
    response = requests.post('https://jobee.pk/job/jobsearch', headers=headers, data=data)

    #Just in case an error shows up
    try:
        return json.loads(response.content)
    except:
        return {'jobs': []}

#Then lets get the page numbers from page 1        
data = getJobs(1)
totalJobs = data['totalJobs']
number_of_pages = math.ceil(totalJobs /200)

#Initializing our job list
jobs_list = []

#Looping through the pages
for pageNumber in range(1,number_of_pages + 1):
    results  = getJobs(pageNumber)

    #If no results we end the loop
    if len(result) == 0: 
        break
    else:
        #We append the results in the ['job'] key to append it to our list
        jobs_list += results['jobs']
        print ('Page', pageNumber,'-', len(jobs_list), "jobs collected")

#Lets have a look to the data into a dataframe
df = pd.DataFrame(jobs_list)
print(df)

Output

Page 1 - 200 jobs collected
Page 2 - 400 jobs collected
Page 3 - 600 jobs collected
...

+----+----------------------+--------------------+-------------+----------------------+------------------+----------------------------------------------------+--------+-------------------+----------------------------------------------------+----------------+--------------------+--------------------------+--------------------------+----------------+----------------------------------------------------+--------------------------------------------+----------------------------------------+-----------+
|    |    appliedByDate     |    companyName     | experience  |     expiredDate      | isSalaryVisible  |                  jobDescription                    | jobID  |  jobIDEncrypted   |                     location                       |     logo       | numberOfPositions  |        postDate          |       publishDate        |  salaryRange   |                      skills                        |                   title                    |     titleWithoutSpecialCharacters      | viewCount |
+----+----------------------+--------------------+-------------+----------------------+------------------+----------------------------------------------------+--------+-------------------+----------------------------------------------------+----------------+--------------------+--------------------------+--------------------------+----------------+----------------------------------------------------+--------------------------------------------+----------------------------------------+-----------+
| 0  | 0001-01-01T00:00:00  | Custom House       | Fresh       | 2019-09-19T00:00:00  | True             | <p>We require Mean Stack Developer Interns who...  | 27925  | a0962bea0bc174a1  | [{'jobLocationID': 0, 'jobID': 27925, 'country...  | 14564Logo.jpg  |                 3  | 2019-06-21T14:04:01.363  | 2019-06-21T19:26:24.213  | 5000 - 10000   | [AngularJs, Mongo DB, JavaScript, Node Js, Mea...  | Mean Stack Developer - Intern              | Mean-Stack-Developer-Intern            |        10 |
| 1  | 0001-01-01T00:00:00  | Custom House       | Fresh       | 2019-09-19T00:00:00  | True             | <p>We requires SEO, Digital Marketing and Grap...  | 27924  | 81e4e7f7d672dffd  | [{'jobLocationID': 0, 'jobID': 27924, 'country...  | 14564Logo.jpg  |                 2  | 2019-06-21T14:00:26.45   | 2019-06-21T19:25:04.493  | 5000 - 10000   | [Graphic Design, Search Engine Optimization (S...  | SEO Executive / Graphic Designer - Intern  | SEO-Executive-Graphic-Designer-Intern  |        10 |
| 2  | 0001-01-01T00:00:00  | Printoscan Lahore  | 1 Year      | 2019-09-19T00:00:00  | True             | <p>We require an <strong>Accounts Assistant / ...  | 27923  | 137a257e9e5bbb5d  | [{'jobLocationID': 0, 'jobID': 27923, 'country...  | None           |                 1  | 2019-06-21T13:59:37.373  | 2019-06-21T19:19:07.36   | 15000 - 20000  | [Accounts Services, Administrative Skills, Acc...  | Accounts Assistant / Administrator         | Accounts-Assistant-Administrator       |         6 |
+----+----------------------+--------------------+-------------+----------------------+------------------+----------------------------------------------------+--------+-------------------+----------------------------------------------------+----------------+--------------------+--------------------------+--------------------------+----------------+----------------------------------------------------+--------------------------------------------+----------------------------------------+-----------+

This is what we wanted.

Upvotes: 1

Related Questions