Reputation: 559
I have some JSON that contains different events with different schemas as follows:
{
"events": [{
"key": "an_action",
"properties": {
"user": "111111111111",
"datetime": "2018-9-05 10:00:00",
"value_a": "123",
"value_b": "123",
"value_c": "123"
}
},
{
"key": "another_action",
"properties": {
"user": "111111111111",
"datetime": "2018-9-05 10:00:00",
"quantity": "1",
"cash": "£123",
"something": "else"
}
},
{
"key": "one_more_action",
"properties": {
"user": "111111111111",
"datetime": "2018-9-05 10:00:00",
"activated": "true"
}
}]
}
When I attempt to parse this with Glue, I end up with a single table with a single column:
events : array
I've tried using a JSON classifier of $.events[*] but this doesn't help.
I feel the problem is that my input JSON structure doesn't work well with a Glue crawler.
Altering the JSON structure seems like the correct thing to do before reaching Glue, but it is not clear from the Glue docs if there is a recommended structure for handling mixed schema events in a single block of JSON.
In my JSON the "key" contains the name of the event and "properties" is the list of properties.
I'd ultimately like to have a table in Redshift for each event (for example an_action, another_action, one_more_action). From there I can look at loading into a fact/dimension star schema, but that's for later.
In short my question is can Glue create multiple tables from a mixed schema JSON? If so what is a preferred JSON structure for doing so that would allow the following:
Thanks
Upvotes: 1
Views: 343
Reputation: 559
Contacted Support and was told that Glue crawlers don't support multiple schemas in the same file. So I have modified the original JSON to have one schema per file and multiple records per file.
Upvotes: 1