Reputation: 1929
I've defined a default crawler on the data directory of an export from dynamodb. I'm trying to get it to give me a structured table instead of a table with a single column of type struct. What do I have to do make get the actual column names in there? I've tried adding custom classifiers and different path expressions but nothing seems to work, and I feel like I'm missing something really obvious.
I'm using the crawler builder inside of glue, which doesn't seem to offer much customization.
Here's the schema from the table generated by the default crawler:
And here's one of the items that I've exported from dynamo:
{
"Item": {
"the_url": {
"S": "/2021/07/06/****redacted****.html"
},
"as_of_when": {
"S": "2021-09-01"
},
"user_hashes": {
"SS": [
"****redacted*****"
]
},
"user_id_hashes": {
"SS": [
"u3MeXDcpQm0ACYuUv6TMrg=="
]
},
"accumulated_count": {
"N": "1"
},
"today_count": {
"N": "1"
}
}
}
Upvotes: 2
Views: 1493
Reputation: 132862
The way Athena interprets JSON data means that your data has only a single column, Item
. Athena doesn't have any mechanism to map arbitrary parts of a JSON object to columns, it can only map top-level attributes to columns.
If you want other parts of the objects as columns you will either have to create a new table with transformed data, or create a view with the attributes as columns, e.g.
CREATE OR REPLACE VIEW attributes_as_top_level_columns AS
SELECT
item.the_url.S AS the_url,
CAST(item.as_of_when.S AS DATE) AS as_of_when,
item.user_hashes.SS AS user_hashes,
item.user_id_hashes.SS AS user_id_hashes,
item.accumulated_count.N AS accumulated_count,
item.today_count.N AS today_count
FROM items
In the example above I've also flattened the data type keys (S
, SS
, N
) and I converted the date string to a date.
Upvotes: 2