novabracket
novabracket

Reputation: 559

Suggested JSON structure for mixed schema events to be crawled by Glue

I have some JSON that contains different events with different schemas as follows:

{
    "events": [{
        "key": "an_action",
        "properties": {
            "user": "111111111111",
            "datetime": "2018-9-05 10:00:00",
            "value_a": "123",
            "value_b": "123",
            "value_c": "123"
        }
    },
    {
        "key": "another_action",
        "properties": {
            "user": "111111111111",
            "datetime": "2018-9-05 10:00:00",
            "quantity": "1",
            "cash": "£123",
            "something": "else"
        }
    },
    {
        "key": "one_more_action",
        "properties": {
            "user": "111111111111",
            "datetime": "2018-9-05 10:00:00",
            "activated": "true"
        }
    }]
}

When I attempt to parse this with Glue, I end up with a single table with a single column:

events : array

I've tried using a JSON classifier of $.events[*] but this doesn't help.

I feel the problem is that my input JSON structure doesn't work well with a Glue crawler.

Altering the JSON structure seems like the correct thing to do before reaching Glue, but it is not clear from the Glue docs if there is a recommended structure for handling mixed schema events in a single block of JSON.

In my JSON the "key" contains the name of the event and "properties" is the list of properties.

I'd ultimately like to have a table in Redshift for each event (for example an_action, another_action, one_more_action). From there I can look at loading into a fact/dimension star schema, but that's for later.

In short my question is can Glue create multiple tables from a mixed schema JSON? If so what is a preferred JSON structure for doing so that would allow the following:

Thanks

Upvotes: 1

Views: 343

Answers (1)

novabracket
novabracket

Reputation: 559

Contacted Support and was told that Glue crawlers don't support multiple schemas in the same file. So I have modified the original JSON to have one schema per file and multiple records per file.

Upvotes: 1

Related Questions