Metaflow data objects not storing in S3

Question

I have the following Metaflow file that runs successfully with the following step :

@step
def scale(self):
    import redshift
    import pandas as pd
    self.event_matrix = self.jointable.pivot_table(index='user_name', columns='event_name', values='odds')
    self.event_matrix_t_scaled = self.event_matrix.T.apply(redshift.scale_user)
    self.tester = 1

    self.next(self.end)

when I open up a notebook and run

run = Flow("Recommender").latest_successful_run
print(f'Using run: {run}')
print(run.data)

It outputs

When I run run.data.event_matrix, it returns a data frame, however when I run run.data. event_user_scaled_matrix , run.data. event_matrix_t_scaled and run.data. tester, these all return the error:

S3 datastore operation _get_s3_object failed (An error occurred (400) when calling the HeadObject operation: Bad Request). Retrying 7 more times..

which leads me to believe that these objects are not getting written to an S3 bucket. But I don't understand what is different between the object that works and all of this which do not work.

Can someone help me see what I am missing?

Romain · Accepted Answer

You can see the path that Metaflow is trying to access in S3 by using run.end_task.artifacts.tester._object for example. This may allow you to debug where the file is supposed to be and why it may no longer be there. There should be no difference between the artifacts you mentioned.

Source: I am a Metaflow developer.

Metaflow data objects not storing in S3

Answers (1)

Related Questions