Reputation: 1615
My s3 file structure is:
├── bucket
│ ├── customer_1
│ │ ├── year=2016
│ │ ├── year=2017
│ │ │ ├── month=11
│ │ | │ ├── sometype-2017-11-01.parquet
│ | | | ├── sometype-2017-11-02.parquet
│ | | | ├── ...
│ │ │ ├── month=12
│ │ | │ ├── sometype-2017-12-01.parquet
│ | | | ├── sometype-2017-12-02.parquet
│ | | | ├── ...
│ │ ├── year=2018
│ │ │ ├── month=01
│ │ | │ ├── sometype-2018-01-01.parquet
│ | | | ├── sometype-2018-01-02.parquet
│ | | | ├── ...
│ ├── customer_2
│ │ ├── year=2017
│ │ │ ├── month=11
│ │ | │ ├── moretype-2017-11-01.parquet
│ | | | ├── moretype-2017-11-02.parquet
│ | | | ├── ...
│ │ ├── year=...
I want create separate table for customer_1 and customer_2 with AWS Glue crawler. It is working if i mention path s3://bucket/customer_1
and s3://bucket/customer_2
.
I've tried s3://bucket/customer_*
and s3://bucket/*
, both are not working and can not create table in Glue catalog
Upvotes: 4
Views: 4071
Reputation: 498
I myself faced this issue recently. AWS GLUE Crawlers has this option Grouping behaviour for S3 data
. If the checkbox is not selected it will try to combine schemas. By selecting the checkbox you can ensure that multiple and separate databases are created.
The table level should be the depth from the root of the bucket, from where you want separate tables.
In your case the depth would be 2.
More here
Upvotes: 4
Reputation: 451
Glue's natural tendency is to add similar schemas(when pointed to the parent folder) to the same table with anything over than a 70% match(Assuming, In your case Cust1 and Cust2 have the same schemas). Keeping them in individual folders might create respective partitions based on the folder names.
Upvotes: 2