Reputation: 101
I've got a set of files with way more columns than we actually need. Of which, the columns included and order may be variable. Using this Table create:
CREATE EXTERNAL TABLE `test1column`(
`column3` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://bucketpath/folder'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1524150460')
Athena just pulls the first column in so the output ends up being:
column3
---------
column1
val1
val2
val3
I'm creating these tables programmatically so I'd like to not have to read through every column name and create a table with more data than I need. If it's not possible to map only certain columns into a table with Athena yet then I suppose I'll have to.
Upvotes: 1
Views: 2125
Reputation: 51
I had a similar requirement, where I need to create an External table in Athena from parquet file stored in S3 location. My research suggests that we can extract the required columns from s3(parquet format) if we have the column names.
For Example: Suppose, the parquet file named sample.parquet has 3 columns. Namely- col1, col2 and col3. But you wish to create a table in Athena say table1 with only col1 and col3. Then the command which worked for me can be found below.
CREATE EXTERNAL TABLE IF NOT EXISTS table1
(
col1 String,
col3 String
)
STORED AS PARQUET
LOCATION 's3://{folder-location}/sample.parquet/'
tblproperties("parquet.compress"="SNAPPY")
The above code snippet would extract only the two columns we wish to extract from the parquet file.
For AWS documentation please refer:- Removing Columns
Upvotes: 1
Reputation: 993
I think NO, you will have to create a table with all the columns and then while selecting data from that table you can specify the columns.
Upvotes: 2