Reputation: 63
I have 2 stages in my data pipeline, First stage reads data from source and dumps to intermediate bucket and next stage reads data from this intermediate bucket. I have athena setup on intermediate stage and we are planning to read this partition data from athena rather than reading a file (reason for using Athena: We might have scenarios where we need to read from different partitions based on some condition in a single read).
Should we go ahead with this approach, as we know Athena has some limitations while reading data into pandas dataframe, like we can only have 1000 records once.
Is there a better solution for this usecase. We are using Pandas.
Upvotes: 0
Views: 132
Reputation: 63
We have decided to use awsdatawrangler for our purposes since it is more reliable and is meant for the same purpose that we are trying achieve.
Upvotes: 0