how do I read in tons of Json buckets using glueContext.create_dynamic_frame_from_options

Question

really hope someone can help me with it..

I want to read in all json files in this path "s3://.../year=2019/month=11/day=06/" how do i do it with glueContext.create_dynamic_frame_from_options ?

if I do glueContext.create_dynamic_frame_from_options("s3", format="json", connection_options = {"paths": [ "s3://.../year=2019/month=11/day=06/" ]}), it won't work.

I had to list every single sub buckets ,I feel there should be a better way. For example: I had to do this df0 = glueContext.create_dynamic_frame_from_options("s3", format="json", connection_options = {"paths": [ "s3://.../year=2019/month=11/day=06/hour=20/minute=12/" ,"s3://.../year=2019/month=11/day=06/hour=20/minute=13/" ,"s3://.../year=2019/month=11/day=06/hour=20/minute=14/" ,"s3://.../year=2019/month=11/day=06/hour=20/minute=15/" ,"s3://.../year=2019/month=11/day=06/hour=20/minute=16/" ....]})

I have thousands of sub buckets to list so I really appreciate any guidance on how I can make my life easier. thank you!!

zhifff · Accepted Answer

I found out the solution -> using "recurse" option when reading large group of files.

how do I read in tons of Json buckets using glueContext.create_dynamic_frame_from_options

Answers (2)

Related Questions