Reputation: 700
I am working on website automation and I want to navigate through different pages and problem is that the website is developed using Angular I think. The Pagination part is having a js function also which is called on an onClick function.
HTML Code is:
<li ng-if="directionLinks" ng-class="{ disabled : pagination.current == pagination.last }" class="ng-scope"><a href="" ng-click="setCurrent(pagination.current + 1)" class="xh-highlight">›</a></li>
Edited:
Website Link: https://jobee.pk/jobs-in-pakistan
Code Tried so far:
from selenium import webdriver
import time
class JobeePK:
def __init__(self):
# self.url = ""
pass
def driver(self):
driver = webdriver.Chrome()
driver.maximize_window()
time.sleep(1)
return driver
# https://www.rozee.pk/job/jsearch/q/all/fc/1185/fpn/
def extractData(self,search_link, total_pages):
driver = self.driver()
driver.get(search_link)
time.sleep(5)
for page_number in range(0, total_pages):
driver.find_element_by_css_selector()
time.sleep(10)
if __name__ == '__main__':
jb = JobeePK()
url = "https://jobee.pk/jobs-in-pakistan"
total_pages = 128
jb.extractData(url, total_pages)
Please suggest me any solution to tackle this problem. Thanks
Upvotes: 0
Views: 405
Reputation: 4482
In such cases, it is always interesting to have a closer look to the page to understand how the data is actually updated.
I did so opening the console in Firefox and having a look at the XHR
traffic network.
... interesting. The page is getting its results from an endpoint we could identify.
It returns json
data which is great:
{'totalJobs': 2541,
'jobs': [{'location': [{'jobLocationID': 0,
'jobID': 24986,
'countryID': 0,
'country': 'Pakistan',
'cityID': None,
'cityText': 'Karachi',
'jobShiftID': 0,
'name': None}],
'jobID': 24986,
'jobIDEncrypted': '26cfb27ee6b2abad',
'title': 'Marketing Officer - Freelancer',
'jobDescription': '<p>We are growing, energetic, and highly-reputed Public Relation (PR) and Digital Marketing Agency.<br />\nCurrently, we are looking for ...
Lets use this to write our script:
import requests
import math
#The scrapping function
def getJobs(pageNumber):
#Defining the headers
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3',
'X-Requested-With': 'XMLHttpRequest',
'Content-Type': 'application/json;charset=utf-8',
'DNT': '1',
'Connection': 'keep-alive',
'Referer': 'https://jobee.pk/jobs-in-pakistan',
'Pragma': 'no-cache'
}
#Setting the right params for the request we will make, pageSize is set to 200 (results by page)
data = {"model":{"titles":[],"cities":[],"shifts":[],"experinces":[],"careerLevels":[],"functionalAreas":[],"genders":[],"industries":[],"degreeLevels":[],"companies":[]},"pageNumber":1,"pageSize":200}
#Updating the page number
data['pageNumber'] = pageNumber
data = json.dumps(data)
#Collecting the results
response = requests.post('https://jobee.pk/job/jobsearch', headers=headers, data=data)
#Just in case an error shows up
try:
return json.loads(response.content)
except:
return {'jobs': []}
#Then lets get the page numbers from page 1
data = getJobs(1)
totalJobs = data['totalJobs']
number_of_pages = math.ceil(totalJobs /200)
#Initializing our job list
jobs_list = []
#Looping through the pages
for pageNumber in range(1,number_of_pages + 1):
results = getJobs(pageNumber)
#If no results we end the loop
if len(result) == 0:
break
else:
#We append the results in the ['job'] key to append it to our list
jobs_list += results['jobs']
print ('Page', pageNumber,'-', len(jobs_list), "jobs collected")
#Lets have a look to the data into a dataframe
df = pd.DataFrame(jobs_list)
print(df)
Output
Page 1 - 200 jobs collected
Page 2 - 400 jobs collected
Page 3 - 600 jobs collected
...
+----+----------------------+--------------------+-------------+----------------------+------------------+----------------------------------------------------+--------+-------------------+----------------------------------------------------+----------------+--------------------+--------------------------+--------------------------+----------------+----------------------------------------------------+--------------------------------------------+----------------------------------------+-----------+
| | appliedByDate | companyName | experience | expiredDate | isSalaryVisible | jobDescription | jobID | jobIDEncrypted | location | logo | numberOfPositions | postDate | publishDate | salaryRange | skills | title | titleWithoutSpecialCharacters | viewCount |
+----+----------------------+--------------------+-------------+----------------------+------------------+----------------------------------------------------+--------+-------------------+----------------------------------------------------+----------------+--------------------+--------------------------+--------------------------+----------------+----------------------------------------------------+--------------------------------------------+----------------------------------------+-----------+
| 0 | 0001-01-01T00:00:00 | Custom House | Fresh | 2019-09-19T00:00:00 | True | <p>We require Mean Stack Developer Interns who... | 27925 | a0962bea0bc174a1 | [{'jobLocationID': 0, 'jobID': 27925, 'country... | 14564Logo.jpg | 3 | 2019-06-21T14:04:01.363 | 2019-06-21T19:26:24.213 | 5000 - 10000 | [AngularJs, Mongo DB, JavaScript, Node Js, Mea... | Mean Stack Developer - Intern | Mean-Stack-Developer-Intern | 10 |
| 1 | 0001-01-01T00:00:00 | Custom House | Fresh | 2019-09-19T00:00:00 | True | <p>We requires SEO, Digital Marketing and Grap... | 27924 | 81e4e7f7d672dffd | [{'jobLocationID': 0, 'jobID': 27924, 'country... | 14564Logo.jpg | 2 | 2019-06-21T14:00:26.45 | 2019-06-21T19:25:04.493 | 5000 - 10000 | [Graphic Design, Search Engine Optimization (S... | SEO Executive / Graphic Designer - Intern | SEO-Executive-Graphic-Designer-Intern | 10 |
| 2 | 0001-01-01T00:00:00 | Printoscan Lahore | 1 Year | 2019-09-19T00:00:00 | True | <p>We require an <strong>Accounts Assistant / ... | 27923 | 137a257e9e5bbb5d | [{'jobLocationID': 0, 'jobID': 27923, 'country... | None | 1 | 2019-06-21T13:59:37.373 | 2019-06-21T19:19:07.36 | 15000 - 20000 | [Accounts Services, Administrative Skills, Acc... | Accounts Assistant / Administrator | Accounts-Assistant-Administrator | 6 |
+----+----------------------+--------------------+-------------+----------------------+------------------+----------------------------------------------------+--------+-------------------+----------------------------------------------------+----------------+--------------------+--------------------------+--------------------------+----------------+----------------------------------------------------+--------------------------------------------+----------------------------------------+-----------+
This is what we wanted.
Upvotes: 1