Niko
Niko

Reputation: 800

Luigi: Dependency between output and requires

Every Luigi task has two basic methods - among others:

  1. requires
  2. output

The documentation is not clear regarding the following: What happens to the tasks defined under requires if the output is already there? To put it more concretely, let's create a dummy task

class DummyTask(luigi.Task):
    def run(self):
        pass

    def requires(self):
        yield TaskA()
        yield Taskb()

    def output(self):
        return luigi.LocalTarget('foo.txt')

Will Luigi check the existence of foo.txt and then proceed with fulfilling the tasks under requires or will it firstly fulfill all the tasks under requires and then check whether the output exists so that is can actually run the run method?

Upvotes: 0

Views: 1148

Answers (1)

Niko
Niko

Reputation: 800

The answer is that Luigi will NOT run any of the tasks under requires if the scheduler finds that the target exists. I created the following snippet which showcases that behavior:

import luigi
from luigi import Task
import logging
from pathlib import Path


class Dummy(Task):
    logger = logging.getLogger('Dummy')
    output_path = '/tmp/dummy_task_output.txt'

    def run(self):
        self.logger.info('Running dummy task')
        Path(self.output_path).touch()

    def requires(self):
        yield TaskA()
        yield TaskB()

    def output(self):
        return luigi.LocalTarget(self.output_path)


class TaskA(Task):
    logger = logging.getLogger('TaskA')
    task_complete = False

    def run(self):
        self.logger.info('Running TaskA')
        self.task_complete = True

    def complete(self):
        return self.task_complete


class TaskB(Task):
    logger = logging.getLogger('TaskB')
    task_complete = False

    def run(self):
        self.logger.info('Running TaskB')
        self.task_complete = True

    def complete(self):
        return self.task_complete

You can run it with PYTHONPATH='.' luigi --module dependency_test Dummy --local-scheduler

The output on the first run:

===== Luigi Execution Summary =====

Scheduled 3 tasks of which:
* 3 ran successfully:
    - 1 Dummy()
    - 1 TaskA()
    - 1 TaskB()

This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====

while the output the second run - since the output /tmp/dummy_task_output.txt exists

===== Luigi Execution Summary =====

Scheduled 1 tasks of which:
* 1 complete ones were encountered:
    - 1 Dummy()

Did not run any tasks
This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====

Upvotes: 1

Related Questions