Reputation: 397
I have a column insert_process_id that I am trying to rename to process_id. This external table is in parquet file format. Please advise how to rename this column.
Upvotes: 5
Views: 10723
Reputation: 101
Parquet uses named based look up. So, metadata based rename renders older data inaccessible. There are two ways to solve this (after renaming)
The second option avoids having to backfill again and is a recommended option for future schema changes as well.
Upvotes: 0
Reputation: 928
You can change column name as below.
Syntax:
ALTER TABLE name CHANGE column_name col_spec ..
Example:
I have created table as below.
CREATE EXTERNAL Table IF NOT EXISTS Patient_external(
PatientID int, Name String, City String, Number String )
COMMENT 'Data about patient from Apollo Hospital'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS PARQUET
LOCATION '/user/training/external/data' ;
Loaded data into table
LOAD DATA LOCAL INPATH
'/home/cloudera/Desktop/HiveTraining/Dataset/patient_details' INTO TABLE Patient_external;
Describe the table
describe formatted Patient_external;
col_name data_type comment
**patientid** int
name string
city string
number string
Detailed Table Information
Database: default
Owner: cloudera
CreateTime: Mon Mar 25 10:57:23 PDT 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location:
hdfs://quickstart.cloudera:8020/user/training/external/data
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
comment Data about patient from Apollo Hospital
numFiles 0
numRows -1
rawDataSize -1
totalSize 0
transient_lastDdlTime 1553536643
Storage Information
SerDe Library:
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Change the column name from PatientID to Patient_ID
Alter Table Patient_external change PatientID Patient_ID Int;
Now describe the table
describe formatted Patient_external;
col_name data_type comment
**patient_id** int
name string
city string
number string
Detailed Table Information
Database: default
Owner: cloudera
CreateTime: Mon Mar 25 10:57:23 PDT 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location:
hdfs://quickstart.cloudera:8020/user/training/external/data
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
comment Data about patient from Apollo Hospital
numFiles 0
numRows -1
rawDataSize -1
totalSize 0
transient_lastDdlTime 1553536643
Storage Information
SerDe Library:
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Upvotes: 1
Reputation: 12930
you can use the following syntax
ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type
[COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];
As this is an external table, you can drop the table and re-create again with specific changes.
Upvotes: 0