Luana Nitsche
Luana Nitsche

Reputation: 11

How to optimize apscheduler for parallel processing with ThreadPoolExecutor and ProcessPoolExecutor

I'm using apscheduler to run several robot tasks in parallel, but I'm experiencing inconsistent performance. Initially, tasks process quickly, but they slow down over time. I need to process 5,000 image files as quickly as possible on a machine with 1 core and 8 logical processors.

Initially, each task processes in about 10 seconds, but after running for a while, the processing time increases to 1-2 minutes per task. I'm not sure what's causing this slowdown.

Here is a minimal version of my code:

from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.executors.pool import ThreadPoolExecutor, ProcessPoolExecutor
import time

# Mock function to simulate processing images
def process_images(task_name, robot_id):
    print(f"Processing task {task_name} with {robot_id}")
    time.sleep(10)  # Simulate a task that takes 10 seconds to complete

if __name__ == '__main__':
    # Configuring executors with thread and process pools
    executors = {
        'default': ThreadPoolExecutor(max_workers=1),  # Increase number of workers
        'processpool': ProcessPoolExecutor(max_workers=1)
    }

    # Using BackgroundScheduler for non-blocking execution
    scheduler = BackgroundScheduler(executors=executors, timezone='America/Sao_Paulo')

    # Adding jobs to the scheduler
    for i in range(1, 7):
        scheduler.add_job(process_images, 'interval', seconds=60, args=[f'ATIVPROCIMA', f'Robo{i}'], id=f'job_{i}')

    print('Starting scheduler. Press Ctrl+C to exit.')

    try:
        scheduler.start()
        while True:
            time.sleep(1)  # Keep the main thread alive
    except (KeyboardInterrupt, SystemExit):
        scheduler.shutdown()
        print("Scheduler stopped.")

Observations and Troubleshooting:

Tasks initially process in about 10 seconds but slow down to 1-2 minutes over time.

What I've Tried: Increased max_workers: Set max_workers to 6 for both ThreadPoolExecutor and ProcessPoolExecutor, but it didn't significantly improve performance.

What was I expecting: I was expecting the scheduler to process 6 tasks at the same time in a short period. For example, 6 tasks (6 calls of the same task) should process in 10 seconds and maintain this performance throughout the schedule. However, after the second or third iteration, it takes more than a minute to process a single image file.

Question: How can I configure apscheduler to optimize performance and ensure consistent processing times for my robot tasks? What could be causing the performance degradation, and how can I troubleshoot it?

Upvotes: 1

Views: 217

Answers (0)

Related Questions