Milano
Milano

Reputation: 18735

Scrapy - how to check if spider is running

I have a Scrapy spider which I run every hour using bash script and crontab.

The running time of the spider is about 50 minutes but can be more than hour.

What I want is to check whether the spider is running and only if not, start new crawling.

BASH SCRIPT

#!/usr/bin/env bash

source /home/milano/.virtualenvs/keywords_search/bin/activate
cd /home/milano/PycharmProjects/keywords_search/bot

# HERE I WANT TO CHECK, WHETHER THE PREVIOUS CRAWLING ALREADY STOPPED, IF NOT, DO NOTHING

scrapy crawl main_spider

The only thing which comes to my mind is to use telnet.

If it can connect - telnet localhost 6023, it means that spider is still running otherwise I can run spider.

Upvotes: 3

Views: 1108

Answers (1)

Maresh
Maresh

Reputation: 4712

You need some sort of locking mechanism.

The best way to achieve an atomic lock from bash is to use mkdir and check the result code to know if you acquired the lock or not.

Here's a more in depth explanation: http://wiki.bash-hackers.org/howto/mutex

Of course you could always go for dirtier methods like a grep on process names or stuff like that.

You could also have a lock in scrapy itself, add a simple middleware check for a shared resource... Plenty of ways to do it :)

Upvotes: 1

Related Questions