Reputation: 161
Provided a partitioned fs structure like the following:
logs
└── log_type
└── 2013
├── 07
│ ├── 28
│ │ ├── host1
│ │ │ └── log_file_1.csv
│ │ └── host2
│ │ ├── log_file_1.csv
│ │ └── log_file_2.csv
│ └── 29
│ ├── host1
│ │ └── log_file_1.csv
│ └── host2
│ └── log_file_1.csv
└── 08
I've been trying to create an external table in Impala:
create external table log_type (
field1 string,
field2 string,
...
)
row format delimited fields terminated by '|' location '/logs/log_type/2013/08';
I wish Impala would recurse into the subdirs and load all the csv files; but no cigar. No errors are thrown but no data is loaded into the table.
Different globs like /logs/log_type/2013/08/*/*
or /logs/log_type/2013/08/*/*/*
did not work either.
Is there a way to do this? Or should I restructure the fs - any advice on that?
Upvotes: 4
Views: 7871
Reputation: 126
With the newer versions of impala you can use the
ALTER TABLE name RECOVER PARTITIONS
command. More info
What you have to be careful about is that, the partitioning fields has to be lowercase as the the directory structure is case sensitive but the impala queries are not.
Upvotes: 0
Reputation: 1349
in case you are still searching for an answer. You need to register each individual partition manually.
See here for details Registering External Table
Your schema for the table needs to be adjusted
create external table log_type (
field1 string,
field2 string,
...)
partitioned by (year int, month int, day int, host string)
row format delimited fields terminated by '|';
After you changed your schema, to include year, month, day and host, you recursively have to add each partition to the table.
Something like this
ALTER TABLE log_type ADD PARTITION (year=2013, month=07, day=28, host="host1")
LOCATION '/logs/log_type/2013/07/28/host1';
Afterwards you need to refresh the table in impala.
invalidate log_type;
refresh log_type;
Upvotes: 9
Reputation: 302
Another way to do this might be to use the LOAD DATA
function in Impala. If your data is in a SequenceFile or other less Impala-friendly format (Impala file formats), you can create your external table like Joey does above but instead of ALTER TABLE
, you can do something like
LOAD DATA INPATH '/logs/log_type/2013/07/28/host1/log_file_1.csv' INTO TABLE log_type PARTITION (year=2013, month=07, day=28, host=host1);
Upvotes: 0