Reputation: 149
I am looking for ways to run SELECT * FROM TABLE query across different databases. To name a few : postgres, teradata, mysql, bigquery, redshift. I want to find a way to parallelize this query so I can spawn multiple threads to read the data. Each thread will read n number of records and dump it to a file(The number of records are parameterized). Example: Table1 has 200 records: spawn 4 threads each reading 50 records and writing to a separate file this giving us 4 files in total. I have found about LIMIT/OFFSET but it is not generic enough to be used across these platforms. I am using jdbc ResultSet.
Is there any way I can achieve this or do I have to write database specific implementations ? The reason I need this I may have to deal with millions of records from a single table and I need to dump that to files limited by n number of records. So if I have 1mil records and my limit size is 200k I'll get 5 files as output.
Any kind of hint/suggestion/help is appreciated, Thank you.
Upvotes: 0
Views: 157