Reputation: 149
I am trying to read data from AWS RDS system and write to Snowflake using SPARK. My SPARK job makes a JDBC connection to RDS and pulls the data into a dataframe and on other hand same dataframe I write to snowflake using snowflake connector.
Problem Statement : When I am trying to write the data, even 30 GB data is taking long time to write.
Solution I tried :
1) repartition the dataframe before writing.
2) caching the dataframe.
3) taking a count of df before writing to reduce scan time at write.
Upvotes: 1
Views: 3208
Reputation: 541
It may have been a while since this question was asked. If you are preparing the dataframe, or using another tool for preparing your data to move to Snowflake, the python connector integrates very nicely. Some recommendations in general for troubleshooting the query, including the comments that were recommended above, which are great, were you able to resolve the jdbc connection with the recent updates?
Some other troubleshooting to consider:
Let me know what you think, I would love to hear how you solved it.
Upvotes: 0