Chris McL
Chris McL

Reputation: 302

Automate daily python process on remote server for improved reliability

I have a python script that runs locally via a scheduled task each day. Most of the time, this is fine -- except when I'm on vacation and the computer it runs on needs to be manually restarted. Or when my internet/power is down.

I am interested in putting it on some kind of rented server time. I'm a totally newbie at this (having never had a production-type process like this). I was unable to find any tutorials that seemed to address this type of use case. How would I install my python environment and any config, data files, or programs that the script needs (e.g., it does some web scraping and uses headless chrome w/a defined user profile).

Given the nature of the program, is it possible to do or would I need to get a dedicated server whose environment can be better set up for my specific needs? The process runs for about 20 seconds a day.

Upvotes: 3

Views: 872

Answers (2)

d1sh4
d1sh4

Reputation: 1810

setting up a whole dedicated server for 20s worth of work is really a suboptimal thing to do. I see a few options:

  • Get a cloud-based VM that gets spin up and down only to run your process. That's relatively easy to automate on Azure, GCP and AWS.
  • Dockerize the application, along with the whole environment and running it as an image on the cloud - e.g. on a service like Beanstalk (AWS) or App Service (Azure) - this is more complex, but should be cheaper as it consumes less resources
  • Get a dedicated VM (droplet?) on a service like Digital Ocean, Heroku or pythonanywhere.com - dependent upon the specifics of your script, it may be quite easy and cheap to set up. This is the easiest and most flexible solution for a newbie I think, but it really depends on your script - you might hit some limitations.

In terms of setting up your environment - there are multiple options, with the most often used being:

  • pyenv (my preferred option)
  • anaconda (quite easy to use)
  • virtualenv / venv

To efficiently recreate your environment, you'll need to come up with a list of dependencies (libraries your script uses).

A summary of the steps:

  1. run $pip freeze > requirements.txt locally
  2. manually edit the requirements.txt file by removing all packages that are not used by your script
  3. create a new virtual environment via pyenv, anaconda or venv and activate it wherever you want to run the script
  4. copy your script & requirements.txt to the new location
  5. run $pip install -r requirements.txt to install the libraries
  6. ensure the script works as expected in its new location
  7. set up the cornjob

Upvotes: 2

miwin
miwin

Reputation: 1215

If the script only runs for 20 seconds and you are not worried about scalability, running it directly on a NAS or raspberry could be a solution for a private environment if you have the hardware on hand.

If you don’t have the necessary hardware available, you may want to have a look at PythonAnywhere which offers a free version.

https://help.pythonanywhere.com/pages/ScheduledTasks/

https://www.pythonanywhere.com/

However, in any professional environment I would opt for a tool like Apache Airflow. Your process of “it does some web scraping and uses headless chrome w/a defined user profile” describes an ETL workflow.

https://airflow.apache.org/

Upvotes: 1

Related Questions