vvazza
vvazza

Reputation: 397

Changing column name in a Hive external table that contains data

I have a column insert_process_id that I am trying to rename to process_id. This external table is in parquet file format. Please advise how to rename this column.

Upvotes: 5

Views: 10723

Answers (3)

Sameer
Sameer

Reputation: 101

Parquet uses named based look up. So, metadata based rename renders older data inaccessible. There are two ways to solve this (after renaming)

  1. Create a backup for the existing table and insert overwrite existing table by picking older and new column based on a criteria.
  2. Create a view and select old and new column based on a criteria (eg: using a partition column value)

The second option avoids having to backfill again and is a recommended option for future schema changes as well.

Upvotes: 0

KZapagol
KZapagol

Reputation: 928

You can change column name as below.

Syntax:

ALTER TABLE name CHANGE column_name col_spec .. 

Example:

  1. I have created table as below.

    CREATE EXTERNAL Table IF NOT EXISTS Patient_external(
    PatientID int, Name String, City String, Number String )
    COMMENT 'Data about patient from Apollo Hospital'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    STORED AS PARQUET 
    LOCATION '/user/training/external/data' ;
    
  2. Loaded data into table

    LOAD DATA LOCAL INPATH 
    '/home/cloudera/Desktop/HiveTraining/Dataset/patient_details' INTO TABLE Patient_external; 
    
  3. Describe the table

    describe formatted Patient_external;
    
    
    
    col_name                data_type               comment             
    
    **patientid**               int                                         
    name                    string                                      
    city                    string                                      
    number                  string                                      
    
    Detailed Table Information       
     Database:              default                  
     Owner:                 cloudera                 
     CreateTime:            Mon Mar 25 10:57:23 PDT 2019     
     LastAccessTime:        UNKNOWN                  
     Protect Mode:          None                     
     Retention:             0                        
     Location:          
      hdfs://quickstart.cloudera:8020/user/training/external/data    
     Table Type:            EXTERNAL_TABLE           
     Table Parameters:       
            COLUMN_STATS_ACCURATE   false               
            EXTERNAL                TRUE                
            comment                 Data about patient from Apollo Hospital
            numFiles                0                   
           numRows              -1                  
           rawDataSize          -1                  
         totalSize              0                   
    transient_lastDdlTime   1553536643          
    
      Storage Information        
         SerDe Library:          
          org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe    
         InputFormat:            
              org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat  
         OutputFormat:           
             org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat  
    
  4. Change the column name from PatientID to Patient_ID

         Alter Table Patient_external change PatientID Patient_ID Int; 
    
  5. Now describe the table

        describe formatted Patient_external;
    
     col_name               data_type               comment             
    
    **patient_id**              int                                         
    name                    string                                      
    city                    string                                      
    number                  string                                      
    
    Detailed Table Information       
     Database:              default                  
     Owner:                 cloudera                 
     CreateTime:            Mon Mar 25 10:57:23 PDT 2019     
     LastAccessTime:        UNKNOWN                  
     Protect Mode:          None                     
     Retention:             0                        
     Location:          
      hdfs://quickstart.cloudera:8020/user/training/external/data    
     Table Type:            EXTERNAL_TABLE           
     Table Parameters:       
            COLUMN_STATS_ACCURATE   false               
            EXTERNAL                TRUE                
            comment                 Data about patient from Apollo Hospital
            numFiles                0                   
           numRows              -1                  
           rawDataSize          -1                  
         totalSize              0                   
    transient_lastDdlTime   1553536643          
    
      Storage Information        
         SerDe Library:          
          org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe    
         InputFormat:            
              org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat  
         OutputFormat:           
             org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat 
    

Upvotes: 1

Gaurang Shah
Gaurang Shah

Reputation: 12930

you can use the following syntax

ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type
  [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];

As this is an external table, you can drop the table and re-create again with specific changes.

Upvotes: 0

Related Questions