Reputation: 1313
i am a pyspark newbie need a little help to resolve below syntax error
ids_2_update = df_to_update.select("id_pk")
# below is obviously giving me exception …can only concatenate str (not "list") to str' …
connection_options["preactions"] = "delete from my_schema.my_table where id_pk in("+ids_2_update.rdd.flatMap(lambda x: x).collect()+");"
appended_dynamic_df = DynamicFrame.fromDF(appended_df, glueContext, "convert_ctx")
glueContext.write_from_options(frame_or_dfc=appended_dynamic_df, connection_type=redshift_connection_type,connection_options=connection_options)
any idea how i can do it ?
disclaimer i need to use pyspark APIs not pySpark sql
Upvotes: 0
Views: 2923
Reputation: 106
The problem is that rdd.collect() returns a list of the elements and you cannot concatenate a string and a list, so you first need to convert the list to a comma separated string to put it in the in clause. You could try something like that:
connection_options["preactions"] = "delete from my_schema.my_table where id_pk in("+','.join(ids_2_update.rdd.flatMap(lambda x: x).collect())+");"
This assumes that the elements in the column are strings, otherwise you should cast them to string first.
Upvotes: 1