matsev
matsev

Reputation: 33749

HIVE_INVALID_METADATA in Amazon Athena

How can I work around the following error in Amazon Athena?

HIVE_INVALID_METADATA: com.facebook.presto.hive.DataCatalogException: Error: : expected at the position 8 of 'struct<x-amz-request-id:string,action:string,label:string,category:string,when:string>' but '-' is found. (Service: null; Status Code: 0; Error Code: null; Request ID: null)

When looking at position 8 in the database table connected to Athena generated by AWS Glue, I can see that it has a column named attributes with a corresponding struct data type:

struct <
    x-amz-request-id:string,
    action:string,
    label:string,
    category:string,
    when:string
>

My guess is that the error occurs because the attributes field is not always populated (c.f. the _session.start event below) and does not always contain all fields (e.g. the DocumentHandling event below does not contain the attributes.x-amz-request-id field). What is the appropriate way to address this problem? Can I make a column optional in Glue? Can (should?) Glue fill the struct with empty strings? Other options?


Background: I have the following backend structure:

I can see PinPoint events successfully being added to json files in S3, e.g.

First event in a file:

{
    "event_type": "_session.start",
    "event_timestamp": 1524835188519,
    "arrival_timestamp": 1524835192884,
    "event_version": "3.1",
    "application": {
        "app_id": "[an app id]",
        "cognito_identity_pool_id": "[a pool id]",
        "sdk": {
            "name": "Mozilla",
            "version": "5.0"
        }
    },
    "client": {
        "client_id": "[a client id]",
        "cognito_id": "[a cognito id]"
    },
    "device": {
        "locale": {
            "code": "en_GB",
            "country": "GB",
            "language": "en"
        },
        "make": "generic web browser",
        "model": "Unknown",
        "platform": {
            "name": "macos",
            "version": "10.12.6"
        }
    },
    "session": {
        "session_id": "[a session id]",
        "start_timestamp": 1524835188519
    },
    "attributes": {},
    "client_context": {
        "custom": {
            "legacy_identifier": "50ebf77917c74f9590c0c0abbe5522d2"
        }
    },
    "awsAccountId": "672057540201"
}

Second event in the same file:

{
    "event_type": "DocumentHandling",
    "event_timestamp": 1524835194932,
    "arrival_timestamp": 1524835200692,
    "event_version": "3.1",
    "application": {
        "app_id": "[an app id]",
        "cognito_identity_pool_id": "[a pool id]",
        "sdk": {
            "name": "Mozilla",
            "version": "5.0"
        }
    },
    "client": {
        "client_id": "[a client id]",
        "cognito_id": "[a cognito id]"
    },
    "device": {
        "locale": {
            "code": "en_GB",
            "country": "GB",
            "language": "en"
        },
        "make": "generic web browser",
        "model": "Unknown",
        "platform": {
            "name": "macos",
            "version": "10.12.6"
        }
    },
    "session": {},
    "attributes": {
        "action": "Button-click",
        "label": "FavoriteStar",
        "category": "Navigation"
    },
    "metrics": {
        "details": 40.0
    },
    "client_context": {
        "custom": {
            "legacy_identifier": "50ebf77917c74f9590c0c0abbe5522d2"
        }
    },
    "awsAccountId": "[aws account id]"
}

Next, AWS Glue has generated a database and a table. Specifically, I see that there is a column named attributes that has the value of

struct <
    x-amz-request-id:string,
    action:string,
    label:string,
    category:string,
    when:string
>

However, when I attempt to Preview table from Athena, i.e. execute the query

SELECT * FROM "pinpoint-test"."pinpoint_testfirehose" limit 10;

I get the error message described earlier.

Side note, I have tried to remove the attributes field (by editing the database table from Glue), but that results in Internal error when executing the SQL query from Athena.

Upvotes: 4

Views: 9635

Answers (3)

Wojciech
Wojciech

Reputation: 11

I believe the problem is your struct element name: x-amz-request-id The "-" in the name. I'm currently dealing with a similar issue since my elements in my struct have "::" in the name.
Sample data:

some_key: {
  "system::date": date,
  "system::nps_rating": 0
}

Glue derived struct Schema (it tried to escape them with ):

struct <
    system\:\:date:String
    system\:\:nps_rating:Int
>

But that still gives me an error in Athena.
I don't have a good solution for this other than changing Struct to STRING and trying to process the data that way.

Upvotes: 1

Kiran Kandala
Kiran Kandala

Reputation: 96

Use tick (`) when table name has - in the name Example: SELECT * FROM `pinpoint-test`.`pinpoint_testfirehose` limit 10;

Make sure you select "default" database on the left pane.

Upvotes: 0

vr00n
vr00n

Reputation: 596

This is a known limitation. Athena table and database names allow only underscore special characters#

Athena table and database names cannot contain special characters, other than underscore (_). Source: http://docs.aws.amazon.com/athena/latest/ug/known-limitations.html

Upvotes: 4

Related Questions