On renaming a column in hive table, it removed all values of that column for its previous data prior to deployment

Question

We just went ahead with a deployment for one of our Hive based table. We renamed our column risk_old to risk_new (renamed). The table is period partitioned. However post deployment, we saw a strange behaviour that all data for that renamed column for all prior periods have been deleted/removed by null. Only newer period data is having values. How is this possible? How can we get back the data.

Sample data prior deployment:

id	risk_old	period
1234	high	20230927
2345	low	20230927
1234	high	20230928
2345	low	20230928

Steps for deployment:

Killing old schedule
Deployment of new code (with changes in code sourcing from new source).
ALTER statement for renaming column from risk_old to risk_new
Creating schedule for the new job

Post deployment data:

id	risk_new	period
1234		20230927
2345		20230927
1234		20230928
2345		20230928
1234	high	20230929
2345	low	20230929

We are looking on how the new column risk_new can also populate all periods data? We cannot revert back changes since all downstream apps have also changed their code renaming columns to fetch data from our table. One catch in UAT, if we try altering back table name from risk_new back to risk_old, it is showing older data but not the newer one.

Any leads in how to retain the name column name with all periods of data? Thanks in advance

On renaming a column in hive table, it removed all values of that column for its previous data prior to deployment

Answers (1)

Related Questions