Keweik
Keweik

Reputation: 187

Luigi dealing with missing tasks with mutiple dependency tasks

Lets say i have about 5 tasks for 5 different period, outputting one excel file each. I will then need to merge these 5 output files in a new task, but one of the tasks has not been completed, but i still want the rest of these 4 files to be merged to one single file. Is there a way of doing this in Luigi. Here is some sample code that might help understand the question

class MakeFile():
    period = luigi.Parameter()
    def run(self):
        return cleaned_file

class MergeFiles():
    def requires(self):
        periods = #mutiple periods
        for period in periods:
            yield MakeFile(period)

    def run(self):
        #merge files here

Upvotes: 0

Views: 187

Answers (1)

iHowell
iHowell

Reputation: 2447

To do what you want, you can write nothing to your output. Basically, Luigi checks that a task is complete if all the things returned by the output method of a task exists. So, you could just open and close the excel files without writing anything and then testing if they are empty in MergeFiles.

Beyond that, you have made a couple of mistakes in your current classes.

  1. In MakeFile, you don't return anything from run. You need to create an output method and return targets. See https://luigi.readthedocs.io/en/stable/tasks.html#task-output for more details.

  2. In the requires method of MergeFiles, you don't yield in the requires method. The yield function is used when you are running a task and need to dynamically require additional tasks. If that is actually what you need, you can read more here: https://luigi.readthedocs.io/en/stable/tasks.html#dynamic-dependencies. I think you should just use return [MakeFile(period) for period in periods] in your requires. Then you can access them in run by using self.input().

Upvotes: 1

Related Questions