user1983682
user1983682

Reputation: 317

AWS Glue DynamicFrames and Push Down Predicate

I am writing an ETL script for AWS Glue that is sourced in S3 stored json files, in which I am creating a DynamicFrame and attempting to use pushDownPredicate logic to restrict the data coming in:

# Define the data restrictor predicate
now = str(int(round(time.time() * 1000)))
now_minus_7_date = datetime.datetime.now() - datetime.timedelta(days=7)
now_minus_7 =  str(int(time.mktime(now_minus_7_date.timetuple()) * 1000))

last_7_predicate = "\"timestamp BETWEEN '" + now_minus_7 + "' AND '" + now + "'\""
print("Your predicate will be :" + last_7_predicate)

The table structure is multiple columns with the partitions (all strings) RegionalCenter, Year, Month, Day, and Timestamp. The error message I am receiving is:

An error occurred while calling o70.getDynamicFrame. User's pushdown predicate: "timestamp BETWEEN '1550254844000' AND '1550859644703'" can not be resolved against partition columns: [regionalcenter,hour,year,timestamp,month,day]

I am new to AWS Glue and Spark, and with that said, am very perplexed as to why the predicate timestamp cannot be resolved against partition columns that do in fact contain timestamp. I have ensured that the timestamps used in the table are in milliseconds. An example from our S3 structure would be:

regionalcenter=Missouri/Year=2019/Month=2/Day=11/Hour=22/Timestamp=1549924089246

The DynamicFrame code is as follows:

    # Read data from table
dynamic_frame = glueContext.create_dynamic_frame.from_catalog(
    database = args['DatabaseName'],
    table_name = args['TableName'],
    transformation_ctx = 'dynamic_frame',
    push_down_predicate = last_7_predicate)

Please let me know what else might be helpful for you here. Being new to this I am not entirely certain what else would be of value. Thank you

Upvotes: 4

Views: 9738

Answers (1)

user1983682
user1983682

Reputation: 317

Ah, I was including too many quotes. Consider this one resolved:

last_7_predicate = "timestamp between '" + now_minus_7 + "' AND '" + now + "'"

Upvotes: 4

Related Questions