Reputation: 71
I have this problem that I want to automate a script. And in passed projects I've used python scheduler for this. But for this project I'm unsure how to handle this.
The problem is that the code works with login details that are outside the code and entered in the commandline when launching the script.
ex. python scriptname.py [email protected] password
How can I automate this with python scheduler? The code that is in 'scriptname.py' is:
//LinkedBot.py
import argparse, os, time
import urlparse, random
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
def getPeopleLinks(page):
links = []
for link in page.find_all('a'):
url = link.get('href')
if url:
if 'profile/view?id=' in url:
links.append(url)
return links
def getJobLinks(page):
links = []
for link in page.find_all('a'):
url = link.get('href')
if url:
if '/jobs' in url:
links.append(url)
return links
def getID(url):
pUrl = urlparse.urlparse(url)
return urlparse.parse_qs(pUrl.query)['id'][0]
def ViewBot(browser):
visited = {}
pList = []
count = 0
while True:
#sleep to make sure everything loads, add random to make us look human.
time.sleep(random.uniform(3.5,6.9))
page = BeautifulSoup(browser.page_source)
people = getPeopleLinks(page)
if people:
for person in people:
ID = getID(person)
if ID not in visited:
pList.append(person)
visited[ID] = 1
if pList: #if there is people to look at look at them
person = pList.pop()
browser.get(person)
count += 1
else: #otherwise find people via the job pages
jobs = getJobLinks(page)
if jobs:
job = random.choice(jobs)
root = 'http://www.linkedin.com'
roots = 'https://www.linkedin.com'
if root not in job or roots not in job:
job = 'https://www.linkedin.com'+job
browser.get(job)
else:
print "I'm Lost Exiting"
break
#Output (Make option for this)
print "[+] "+browser.title+" Visited! \n("\
+str(count)+"/"+str(len(pList))+") Visited/Queue)"
def Main():
parser = argparse.ArgumentParser()
parser.add_argument("email", help="linkedin email")
parser.add_argument("password", help="linkedin password")
args = parser.parse_args()
browser = webdriver.Firefox()
browser.get("https://linkedin.com/uas/login")
emailElement = browser.find_element_by_id("session_key-login")
emailElement.send_keys(args.email)
passElement = browser.find_element_by_id("session_password-login")
passElement.send_keys(args.password)
passElement.submit()
Running this on OSX.
Upvotes: 1
Views: 692
Reputation: 1
You can pass the args to the python scheduler.
scheduler.enter(delay, priority, action, argument=(), kwargs={}) Schedule an event for delay more time units. Other than the relative time, the other arguments, the effect and the return value are the same as those for enterabs(). Changed in version 3.3: argument parameter is optional. New in version 3.3: kwargs parameter was added.
>>> import sched, time
>>> s = sched.scheduler(time.time, time.sleep)
>>> def print_time(a='default'):
... print("From print_time", time.time(), a)
...
>>> def print_some_times():
... print(time.time())
... s.enter(10, 1, print_time)
... s.enter(5, 2, print_time, argument=('positional',))
... s.enter(5, 1, print_time, kwargs={'a': 'keyword'})
... s.run()
... print(time.time())
...
>>> print_some_times()
930343690.257
From print_time 930343695.274 positional
From print_time 930343695.275 keyword
From print_time 930343700.273 default
930343700.276
Upvotes: 0
Reputation: 842
Have you tried using LinkedIn's REST Api instead of retrieving heavy pages, filling in some form and sending it back?
Your code is prone to be broken whenever LinkedIn changes some elements in their page. Whereas the Api is a contract between LinkedIn and the users.
Check here https://developer.linkedin.com/docs/rest-api and there https://developer.linkedin.com/docs/guide/v2/concepts/methods
So that you don't have to pass your credentials through command line (especially your password, which will be readable in clear through history), you should either
Moreover, for the scheduling part, you can use cron.
If you're looking for a 100% Python solution, you can use the excellent Celery project. Check its periodic tasks.
Upvotes: 0
Reputation: 1433
I can see at least two different way of automating the trigger of your script. Since you are mentioning that your script is started this way:
python scriptname.py [email protected] password
It means that you start it from a shell. As you want to have it scheduled, it sounds like a Crontab is a perfect answer. (see https://kvz.io/blog/2007/07/29/schedule-tasks-on-linux-using-crontab/ for example)
If you really want to use python scheduler, you can use the subprocess.
In your file using python scheduler:
import subprocess
subprocess.call("python scriptname.py [email protected] password", shell=True)
What is the best way to call a Python script from another Python script?
Upvotes: 1