r123
r123

Reputation: 618

How to model Amazon Redshift external table for changing column structure?

I have a 5 columns, c1, c2, c3, c4, c5 in source table (MySQL). I exported the table and added in S3 like:

s3://data/table_name/YYYYMMDD/table_nameYYYYMMDDHHMMSS.csv

And I exported the data for the 3 months of data.

In later stages, two new columns say c6 and c7 added. Now the exported files in s3://data/table_name/YYYYMMDD/table_nameYYYYMMDDHHMMSS.csv will have 7 columns

And there is a chance that c4 column is being removed as well. So the exported files will have 6 columns, i.e. c1, c2, c3, c5, c6, c7

How the external table can be modelled to adapt these changes?

Upvotes: 2

Views: 2580

Answers (2)

Jon Scott
Jon Scott

Reputation: 4354

Actually, if you use redshift spectrum / athena you may be OK!

Adding columns - is fine - just run the glue crawler and the new column will be available

deleting columns - is fine - the deleted columns will just appear as nulls

You cannot change a definition of a column.

I definitely recommend - if you can - to convert to parquet and to explore using partitions for your data.

Also, I recommend that you try it out - add the data one change at a time - re-run the glue crawler as necessary - and use athena to make sure all is ok.

Upvotes: 2

demircioglu
demircioglu

Reputation: 3465

You will need multiple external tables.

Since there is data drift, I think you should create separate tables for each column combinations you have and create a view (with union all) on top them to display data.

So you'll have

table 1 : columns : c1, c2, c3, c4, c5 
table 2 : columns : c1, c2, c3, c4, c5, c6, c7
table 3 : columns : c1, c2, c3, c5, c6, c7

view as

select c1, c2, c3, c4, c5, '' c6, '' c7 from table1 union all
select c1, c2, c3, c4, c5, c6, c7 from table2 union all
select c1, c2, c3, '' c4, c5, c6, c7 from table3

If there is another combination comes up you need to create a 4th table and update your view definition.

Upvotes: 1

Related Questions