Sauvage
Sauvage

Reputation: 71

Python schedule with commandline

I have this problem that I want to automate a script. And in passed projects I've used python scheduler for this. But for this project I'm unsure how to handle this.

The problem is that the code works with login details that are outside the code and entered in the commandline when launching the script.

ex. python scriptname.py [email protected] password

How can I automate this with python scheduler? The code that is in 'scriptname.py' is:

//LinkedBot.py
import argparse, os, time
import urlparse, random
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

def getPeopleLinks(page):
    links = []
    for link in page.find_all('a'):
        url = link.get('href')
        if url:
            if 'profile/view?id=' in url:
                links.append(url)
    return links

def getJobLinks(page):
    links = []
    for link in page.find_all('a'):
        url = link.get('href')
        if url:       
            if '/jobs' in url:
                links.append(url)
    return links

def getID(url):
    pUrl = urlparse.urlparse(url)
    return urlparse.parse_qs(pUrl.query)['id'][0]


def ViewBot(browser):
    visited = {}
    pList = []
    count = 0
    while True:
        #sleep to make sure everything loads, add random to make us look human.
        time.sleep(random.uniform(3.5,6.9))
        page = BeautifulSoup(browser.page_source)
        people = getPeopleLinks(page)
        if people:
            for person in people:
                ID = getID(person)
                if ID not in visited:
                    pList.append(person)
                    visited[ID] = 1
        if pList: #if there is people to look at look at them
            person = pList.pop()
            browser.get(person)
            count += 1
        else: #otherwise find people via the job pages
            jobs = getJobLinks(page)
            if jobs:
                job = random.choice(jobs)
                root = 'http://www.linkedin.com'
                roots = 'https://www.linkedin.com'
                if root not in job or roots not in job:
                    job = 'https://www.linkedin.com'+job
                browser.get(job)
            else:
                print "I'm Lost Exiting"
                break

        #Output (Make option for this)           
        print "[+] "+browser.title+" Visited! \n("\
            +str(count)+"/"+str(len(pList))+") Visited/Queue)"


def Main():
    parser = argparse.ArgumentParser()
    parser.add_argument("email", help="linkedin email")
    parser.add_argument("password", help="linkedin password")
    args = parser.parse_args()

    browser = webdriver.Firefox()

    browser.get("https://linkedin.com/uas/login")


    emailElement = browser.find_element_by_id("session_key-login")
    emailElement.send_keys(args.email)
    passElement = browser.find_element_by_id("session_password-login")
    passElement.send_keys(args.password)
    passElement.submit()

Running this on OSX.

Upvotes: 1

Views: 692

Answers (3)

Igor Gradov
Igor Gradov

Reputation: 1

You can pass the args to the python scheduler.

scheduler.enter(delay, priority, action, argument=(), kwargs={}) Schedule an event for delay more time units. Other than the relative time, the other arguments, the effect and the return value are the same as those for enterabs(). Changed in version 3.3: argument parameter is optional. New in version 3.3: kwargs parameter was added.

>>> import sched, time
>>> s = sched.scheduler(time.time, time.sleep)
>>> def print_time(a='default'):
...     print("From print_time", time.time(), a)
...
>>> def print_some_times():
...     print(time.time())
...     s.enter(10, 1, print_time)
...     s.enter(5, 2, print_time, argument=('positional',))
...     s.enter(5, 1, print_time, kwargs={'a': 'keyword'})
...     s.run()
...     print(time.time())
...
>>> print_some_times()
930343690.257
From print_time 930343695.274 positional
From print_time 930343695.275 keyword
From print_time 930343700.273 default
930343700.276

Upvotes: 0

Samuel GIFFARD
Samuel GIFFARD

Reputation: 842

About the code itself

LinkedIn REST Api

Have you tried using LinkedIn's REST Api instead of retrieving heavy pages, filling in some form and sending it back?

Your code is prone to be broken whenever LinkedIn changes some elements in their page. Whereas the Api is a contract between LinkedIn and the users.

Check here https://developer.linkedin.com/docs/rest-api and there https://developer.linkedin.com/docs/guide/v2/concepts/methods

Credentials

So that you don't have to pass your credentials through command line (especially your password, which will be readable in clear through history), you should either

  • use a config file (with your Api Key) and read it with ConfigParser (or anything else, depending on the format of your config file (json, python, etc...)
  • or set them into your environment variables.

For the scheduling

Using Cron

Moreover, for the scheduling part, you can use cron.

Using Celery

If you're looking for a 100% Python solution, you can use the excellent Celery project. Check its periodic tasks.

Upvotes: 0

Luc
Luc

Reputation: 1433

I can see at least two different way of automating the trigger of your script. Since you are mentioning that your script is started this way:

python scriptname.py [email protected] password

It means that you start it from a shell. As you want to have it scheduled, it sounds like a Crontab is a perfect answer. (see https://kvz.io/blog/2007/07/29/schedule-tasks-on-linux-using-crontab/ for example)

If you really want to use python scheduler, you can use the subprocess.

In your file using python scheduler:

import subprocess

subprocess.call("python scriptname.py [email protected] password", shell=True)

What is the best way to call a Python script from another Python script?

Upvotes: 1

Related Questions