AttributeError thrown at build time but not in preview mode

Question

I want to write unstructured data to file in a transform. Running the transform in preview mode is successful, but building the dataset is not. Specifically, it seems like the FileSystem object has a files_dir attribute only when previewing and not at build time.

@transform(
    data_input = Input("..."),
    data_output= Output("..."),
)
def transform(data_input, data_output):
    for file_status in data_input.filesystem().files('*.').collect():
         data= PreProcessor(f"{data_input.filesystem().files_dir}/{file_status.path}")

         # Throws "AttributeError: 'FileSystem' object has no attribute 'files_dir'"
         data.write(f"{data_output.filesystem().files_dir}/transformed_file")

How can I resolve this issue? Why is the files_dir attribute not set at build time?

stn53 · Accepted Answer

The issue occurs because files_dir at runtime during preview mode. Instead, I used the hadoop_path attribute, which fixes the issue.

AttributeError thrown at build time but not in preview mode

Answers (2)

Related Questions