Reputation: 33749
How can I work around the following error in Amazon Athena?
HIVE_INVALID_METADATA: com.facebook.presto.hive.DataCatalogException: Error: : expected at the position 8 of 'struct<x-amz-request-id:string,action:string,label:string,category:string,when:string>' but '-' is found. (Service: null; Status Code: 0; Error Code: null; Request ID: null)
When looking at position 8 in the database table connected to Athena generated by AWS Glue, I can see that it has a column named attributes
with a corresponding struct data type:
struct <
x-amz-request-id:string,
action:string,
label:string,
category:string,
when:string
>
My guess is that the error occurs because the attributes
field is not always populated (c.f. the _session.start
event below) and does not always contain all fields (e.g. the DocumentHandling
event below does not contain the attributes.x-amz-request-id
field). What is the appropriate way to address this problem? Can I make a column optional in Glue? Can (should?) Glue fill the struct with empty strings? Other options?
Background: I have the following backend structure:
I can see PinPoint events successfully being added to json files in S3, e.g.
First event in a file:
{
"event_type": "_session.start",
"event_timestamp": 1524835188519,
"arrival_timestamp": 1524835192884,
"event_version": "3.1",
"application": {
"app_id": "[an app id]",
"cognito_identity_pool_id": "[a pool id]",
"sdk": {
"name": "Mozilla",
"version": "5.0"
}
},
"client": {
"client_id": "[a client id]",
"cognito_id": "[a cognito id]"
},
"device": {
"locale": {
"code": "en_GB",
"country": "GB",
"language": "en"
},
"make": "generic web browser",
"model": "Unknown",
"platform": {
"name": "macos",
"version": "10.12.6"
}
},
"session": {
"session_id": "[a session id]",
"start_timestamp": 1524835188519
},
"attributes": {},
"client_context": {
"custom": {
"legacy_identifier": "50ebf77917c74f9590c0c0abbe5522d2"
}
},
"awsAccountId": "672057540201"
}
Second event in the same file:
{
"event_type": "DocumentHandling",
"event_timestamp": 1524835194932,
"arrival_timestamp": 1524835200692,
"event_version": "3.1",
"application": {
"app_id": "[an app id]",
"cognito_identity_pool_id": "[a pool id]",
"sdk": {
"name": "Mozilla",
"version": "5.0"
}
},
"client": {
"client_id": "[a client id]",
"cognito_id": "[a cognito id]"
},
"device": {
"locale": {
"code": "en_GB",
"country": "GB",
"language": "en"
},
"make": "generic web browser",
"model": "Unknown",
"platform": {
"name": "macos",
"version": "10.12.6"
}
},
"session": {},
"attributes": {
"action": "Button-click",
"label": "FavoriteStar",
"category": "Navigation"
},
"metrics": {
"details": 40.0
},
"client_context": {
"custom": {
"legacy_identifier": "50ebf77917c74f9590c0c0abbe5522d2"
}
},
"awsAccountId": "[aws account id]"
}
Next, AWS Glue has generated a database and a table. Specifically, I see that there is a column named attributes
that has the value of
struct <
x-amz-request-id:string,
action:string,
label:string,
category:string,
when:string
>
However, when I attempt to Preview table
from Athena, i.e. execute the query
SELECT * FROM "pinpoint-test"."pinpoint_testfirehose" limit 10;
I get the error message described earlier.
Side note, I have tried to remove the attributes
field (by editing the database table from Glue), but that results in Internal error
when executing the SQL query from Athena.
Upvotes: 4
Views: 9635
Reputation: 11
I believe the problem is your struct element name: x-amz-request-id
The "-" in the name.
I'm currently dealing with a similar issue since my elements in my struct have "::" in the name.
Sample data:
some_key: {
"system::date": date,
"system::nps_rating": 0
}
Glue derived struct Schema (it tried to escape them with ):
struct <
system\:\:date:String
system\:\:nps_rating:Int
>
But that still gives me an error in Athena.
I don't have a good solution for this other than changing Struct to STRING and trying to process the data that way.
Upvotes: 1
Reputation: 96
Use tick (`) when table name has - in the name Example: SELECT * FROM `pinpoint-test`.`pinpoint_testfirehose` limit 10;
Make sure you select "default" database on the left pane.
Upvotes: 0
Reputation: 596
This is a known limitation. Athena table and database names allow only underscore special characters#
Athena table and database names cannot contain special characters, other than underscore (_). Source: http://docs.aws.amazon.com/athena/latest/ug/known-limitations.html
Upvotes: 4