Sam Comber
Sam Comber

Reputation: 1293

How to have instance attribute defined and then persisted across child classes

I'm trying to have a instance attribute, root_path, persist across my code. I have an InitialiseStep class which accepts root_path, and when it is defined i.e. by InitialiseStep(root_path="/test/path/") I want CampaignData to be able to print the value "/test/path/" when I trigger it as part of a pipeline.

A reproducible example should make this clearer...

class InitialiseStep():
    def __init__(self, root_path=None):
        self.root_path = root_path

    def execute(self):
        print(self.root_path)


class SparkStep(InitialiseStep):
    def __init__(self, spark=None):
        super().__init__()
        self.spark = spark
        
    def execute(self):
        print(self.root_path)


class CampaignData(SparkStep):
    def __init__(self, spark):
      super().__init__()

      
class Executor:
    def __init__(self, steps):
        self.steps = steps
       
    def run_step(self, step):
        step.execute()

    def run_pipeline(self):
        for step in self.steps:
            self.run_step(step)
        
executor = Executor([InitialiseStep(root_path="/test/path/"), CampaignData(spark)])
executor.run_pipeline()

This currently outputs

/test/path/

None

How can I adapt this code to print?

/test/path/

/test/path/

Any suggestions would be greatly appreciated!

Upvotes: 0

Views: 54

Answers (1)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95948

So, the problem is that root_path is an instance variable. It belongs to the instance, not even other instances of that class, or any other subclass for that matter.

You could use a class variable, that would solve your problem:

class InitialiseStep():
    def __init__(self, root_path=None):
        InitialiseStep.root_path = root_path

    def execute(self):
        print(self.root_path)

But I would find this design highly inadvisable. I would either bite the bullet and just provide root_path everywhere you need it explicitly, e.g., handing it to CampaignData(root_path, spark).

Or probably, you should rethink where this needs to be handled. For example, it could be an attribute of the Executor,

class Executor:
    def __init__(self, steps, root_path):
        self.steps = steps
        self.root_path = root_path

Injected as an argument to execute:

def execute(self, root_path=None):
    print(root_path)

And the executor then can provide it to the Step, i.e.:

def run_step(self, step):
    step.execute(root_path=self.root_path)

Still, this might make one uneasy, this is making our classes more tightly coupled. Perhaps you just need another way to handle this with some sort of MetaData class.

Upvotes: 1

Related Questions