user3462649
user3462649

Reputation: 1677

Can AWS Glue process records row wise

I have requirement to process records from one redshift cluster to another row wise. We want to process row wise because we want to handle failed/invalid records in different way. And other benefit is we want to avoid batch reprocessing in case of one record failure. So, wanted to check if AWS Glue is suitable for that or not? If this is not suitable any other tool which provides row processing functionality?

Upvotes: 0

Views: 1139

Answers (1)

Jon Scott
Jon Scott

Reputation: 4354

AWS glue allows you to implement your own PySpark scripts as part of the transformation process.

Pyspark allows implementation of a function to run against each row.

There are many ways to do this, for example:

def f_udf(x):
    return (x + 1)
df2 = df.withColumn("result", max_udf(df.col1))

thi runs the function f_udf for each row of df and produces df2.

AWS Glue specific documentation on this can be found here

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-map

Upvotes: 1

Related Questions