Hemesh Patel
Hemesh Patel

Reputation: 25

re-use similar luigi tasks

I have a luigi task which reads a .sql file and outputs to BigQuery.

My question is there any way I can reuse that same task with a different .sql file without having to copy the whole luigi task, i.e. I want to create instances of a template luigi task.

class run_sql(luigi.task):
    sql_file = 'path/to/sql/file'  # This is the only bit of code that changes 
    def complete(self):
        ...
    def requires(self):
        ...
    def run(self):
        ...

Upvotes: 1

Views: 369

Answers (2)

cangers
cangers

Reputation: 408

Building off of @matagus' answer, you can also subclass RunSql to define a sql file, using the complete(), requires(), and run() methods of the parent class.

class RunSqlFile(RunSql):
    sql_file = '/path/to/file.sql`

Or you can use the @property decorator to reference attributes of the RunSql class. I often do this to set a directory, or other configuration data, in the parent class, then reference them in subclasses.

class RunSql(luigi.Task):
    sql_file = luigi.Parameter()

    def get_file(self, name):
        default_dir = '/path/to/sql/dir'
        return os.path.join(default_dir, name)

   def requires(self):
        ...


class RunSqlFile(RunTask):

    @property
    def sql_file(self):
        return self.get_file("query.sql")

And that will act as if you'd instantiated the class with --sql-file /path/to/sql/dir/query.sql

Upvotes: 1

matagus
matagus

Reputation: 6206

Just use a parameter to specify the path to the file. Something like this:

class RunSql(luigi.task):

    sql_file = luigi.Parameter()

    def complete(self):
        ...

    def requires(self):
        ...

    def run(self):
        ...

In order to access the value of the param just use self.sql_file in your code.

After that you may run your task this way:

luigi RunSql --sql-file path/to/file.sql

Upvotes: 1

Related Questions