Toxicboy
Toxicboy

Reputation: 11

On renaming a column in hive table, it removed all values of that column for its previous data prior to deployment

We just went ahead with a deployment for one of our Hive based table. We renamed our column risk_old to risk_new (renamed). The table is period partitioned. However post deployment, we saw a strange behaviour that all data for that renamed column for all prior periods have been deleted/removed by null. Only newer period data is having values. How is this possible? How can we get back the data.

Sample data prior deployment:

id risk_old period
1234 high 20230927
2345 low 20230927
1234 high 20230928
2345 low 20230928

Steps for deployment:

  1. Killing old schedule
  2. Deployment of new code (with changes in code sourcing from new source).
  3. ALTER statement for renaming column from risk_old to risk_new
  4. Creating schedule for the new job

Post deployment data:

id risk_new period
1234 20230927
2345 20230927
1234 20230928
2345 20230928
1234 high 20230929
2345 low 20230929

We are looking on how the new column risk_new can also populate all periods data? We cannot revert back changes since all downstream apps have also changed their code renaming columns to fetch data from our table. One catch in UAT, if we try altering back table name from risk_new back to risk_old, it is showing older data but not the newer one.

Any leads in how to retain the name column name with all periods of data? Thanks in advance

Upvotes: 1

Views: 159

Answers (1)

Koushik Roy
Koushik Roy

Reputation: 7407

Renaming columns in hive with tables stored as parquet doesnt work properly. Because the schema info is stored in a file and the schema info changes when you fire alter table statement but underlying data doesnt change. This shows odd/old values for that column.
Clean solution would be -

  1. Create a table with new definition from old table and rename columns while selecting
    create table newtable as select id, risk_old as risk_new, period from oldtable ;

  2. Drop old table drop table oldtable.

  3. Alter new table and rename to oldtable. alter table newtable rename to oldtable;

Upvotes: 0

Related Questions