Reputation: 437
If it is an avro, orc or parquet table, I can use respective libraries to fetch the schema. But if the input/output format is TXT, and data is stored in csv files, how can I get the schema programatically?
Thanks,
Upvotes: 0
Views: 7165
Reputation: 928
You can use DESCRIBE
statement which diplays metadata about a table such as column names and their Data Types.
The DESCRIBE FORMATTED
displays additional information, in a format familiar to users of Apache Hive.
Example :
I created a table as below.
CREATE TABLE IF NOT EXISTS Employee_Local( EmployeeId INT,Name STRING,
Designation STRING,State STRING, Number STRING)
ROW Format Delimited Fields Terminated by ',' STORED AS Textfile;
DESCRIBE Statement
You can use the abbreviation DESC for the DESCRIBE statement.
hive> DESCRIBE Employee_Local;
OK
employeeid int
name string
designation string
state string
number string
DESCRIBE FORMATTED Statement
hive> describe formatted Employee_Local;
OK
# col_name data_type comment
employeeid int
name string
designation string
state string
number string
# Detailed Table Information
Database: default
Owner: cloudera
CreateTime: Fri Mar 15 10:53:35 PDT 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee_test
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1552672415
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 0.544 seconds, Fetched: 31 row(s)
Even you can get schema of a Hive table from Spark Shell as below:
scala> spark.sql("desc formatted test_loop").collect().foreach(println)
[policyid,bigint,null]
[statecode,string,null]
[county,string,null]
[eq_site_limit,bigint,null]
[hu_site_limit,bigint,null]
[fl_site_limit,bigint,null]
[fr_site_limit,bigint,null]
[tiv_2011,bigint,null]
[tiv_2012,double,null]
[eq_site_deductible,double,null]
[hu_site_deductible,double,null]
[fl_site_deductible,double,null]
[fr_site_deductible,double,null]
[point_latitude,double,null]
[point_longitude,double,null]
[line,string,null]
[construction,string,null]
[point_granularity,bigint,null]
[,,]
[# Detailed Table Information,,]
[Database:,default,]
[Owner:,mapr,]
[Create Time:,Fri May 26 17:56:04 EDT 2017,]
[Last Access Time:,Wed Dec 31 19:00:00 EST 1969,]
[Location:,maprfs:/user/hv2/warehouse/test_loop,]
[Table Type:,MANAGED,]
[Table Parameters:,,]
[ rawDataSize,254192494,]
[ numFiles,1,]
[ transient_lastDdlTime,1495845784,]
[ totalSize,251167564,]
[ numRows,3024360,]
[,,]
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
[InputFormat:,org.apache.hadoop.mapred.TextInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[ serialization.format,1,]
Upvotes: 2