Aditya Sahu
Aditya Sahu

Reputation: 63

Reading Partitioned Data through Athena in downstream jobs in pandas

I have 2 stages in my data pipeline, First stage reads data from source and dumps to intermediate bucket and next stage reads data from this intermediate bucket. I have athena setup on intermediate stage and we are planning to read this partition data from athena rather than reading a file (reason for using Athena: We might have scenarios where we need to read from different partitions based on some condition in a single read).

Should we go ahead with this approach, as we know Athena has some limitations while reading data into pandas dataframe, like we can only have 1000 records once.

Is there a better solution for this usecase. We are using Pandas.

Upvotes: 0

Views: 132

Answers (1)

Aditya Sahu
Aditya Sahu

Reputation: 63

We have decided to use awsdatawrangler for our purposes since it is more reliable and is meant for the same purpose that we are trying achieve.

Upvotes: 0

Related Questions