Can AWS Glue process records row wise

Question

I have requirement to process records from one redshift cluster to another row wise. We want to process row wise because we want to handle failed/invalid records in different way. And other benefit is we want to avoid batch reprocessing in case of one record failure. So, wanted to check if AWS Glue is suitable for that or not? If this is not suitable any other tool which provides row processing functionality?

Jon Scott · Accepted Answer

AWS glue allows you to implement your own PySpark scripts as part of the transformation process.

Pyspark allows implementation of a function to run against each row.

There are many ways to do this, for example:

def f_udf(x):
    return (x + 1)
df2 = df.withColumn("result", max_udf(df.col1))

thi runs the function f_udf for each row of df and produces df2.

AWS Glue specific documentation on this can be found here

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-map

Can AWS Glue process records row wise

Answers (1)

Related Questions