arj
arj

Reputation: 983

AWS CLI Create Athena table from shell script

I am trying to create an Athena table from shell scrip. This is my code

aws athena start-query-execution \
    --query-string "CREATE EXTERNAL TABLE <table name>( `user_id` string, `file_name` string, `file_type` string, `count` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'escapeChar'='\\', 'quoteChar'='\"', 'separatorChar'=',') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '<s3 location>/"{$client_id}/ \
    --work-group "primary" \
    --query-execution-context Database=<dbname>\
    --result-configuration "OutputLocation="<s3 location>"

When i am executing the query it shows below error

start.sh: 1: start.sh: user_id: not found
start.sh: 1: start.sh: file_name: not found
start.sh: 1: start.sh: file_type: not found
start.sh: 1: start.sh: count: not found

An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:8: mismatched input 'EXTERNAL'. Expecting: 'OR', 'SCHEMA', 'TABLE', 'VIEW'
generating report
{
    "QueryExecutionId": "4c5b47cc-a87b-4d9a-8048-537a12534b0c"
}

Above Query is working fine in Athena console

Edit 1

Removed backtick now the query like this

aws athena start-query-execution \
    --query-string "CREATE EXTERNAL TABLE <table name>( user_id string, file_name string, file_type string, count string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'escapeChar'='\\', 'quoteChar'='\"', 'separatorChar'=',') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '<s3 location>/{$client_id}/'" \
    --work-group "primary" \
    --query-execution-context Database=<dbname>\
    --result-configuration "OutputLocation="<s3 location>"

Now am getting a new error

An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:8: mismatched input 'EXTERNAL'. Expecting: 'OR', 'SCHEMA', 'TABLE', 'VIEW'

Edit 2 During the script execution back slash was removed ,that this the issue. while running am getting like this

SERDEPROPERTIES ( 'escapeChar'='\', 'quoteChar'='"', 'separatorChar'=',')

Actual expectation will be like this

SERDEPROPERTIES ( 'escapeChar'='\\', 'quoteChar'='\"', 'separatorChar'=',')

Does any way to resolve this?

Upvotes: 0

Views: 2060

Answers (1)

Theo
Theo

Reputation: 132862

In your shell, backticks are command substitutions – or in other words: if you put something in backticks it's executed as a command and whatever it prints out is put into the string. Your shell does this in double quoted strings, but most shells won't do it in single quoted strings.

I understand you're using double quotes since the SQL contains single quoted strings. You can either surround the SQL with single quotes and escape all the single quotes inside, or you can escape the backticks.

However, the backticks are optional in Athena DDL, so the easiest way forward is to just remove them. Unless one of your columns is called "date" or "table", or any other reserved word you don't need the backticks.

A completely different approach would be to use the Glue Data Catalog API, this is what Athena does behind the scenes. It's a bit verbose, but so is creating DDL statements.

Upvotes: 1

Related Questions