Reputation: 320
Yesterday, I was create my first datasource Druid from Hive. Today, I'm not sure that works...
First, I ran the following code for create my Db :
SET hive.druid.broker.address.default = 10.20.173.30:8082;
SET hive.druid.metadata.username = druid;
SET hive.druid.metadata.password = druid_password;
SET hive.druid.metadata.db.type = postgresql;
SET hive.druid.metadata.uri = jdbc:postgresql://10.20.173.31:5432/druid;
CREATE EXTERNAL TABLE test (
`__time` TIMESTAMP,
`userId` STRING,
`lang` STRING,
`location` STRING,
`name` STRING
)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
I can see this datasource on my Hive architecture. How can I know that this datasource is a Druid Datasource and not a Hive table.
I tested this but I don't know if it's a Druid datasource.
DESCRIBE FORMATTED test;
Result
+-------------------------------+----------------------------------------------------+----------------------------------------------------+
| col_name | data_type | comment |
+-------------------------------+----------------------------------------------------+----------------------------------------------------+
| # col_name | data_type | comment |
| __time | timestamp | from deserializer |
| userid | string | from deserializer |
| lang | string | from deserializer |
| location | string | from deserializer |
| name | string | from deserializer |
| # Detailed Table Information | NULL | NULL |
| Database: | druid_datasources | NULL |
| OwnerType: | USER | NULL |
| Owner: | hive | NULL |
| CreateTime: | Tue Oct 15 12:42:22 CEST 2019 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://10.20.173.30:8020/warehouse/tablespace/external/hive/druid_datasources.db/test | NULL |
| Table Type: | EXTERNAL_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"__time\":\"true\",\"lang\":\"true\",\"location\":\"true\",\"name\":\"true\",\"userid\":\"true\"}} |
| | EXTERNAL | TRUE |
| | bucketing_version | 2 |
| | druid.datasource | druid_datasources.test ||
| | numFiles | 0 |
| | numRows | 0 |
| | rawDataSize | 0 |
| | storage_handler | org.apache.hadoop.hive.druid.DruidStorageHandler |
| | totalSize | 0 |
| | transient_lastDdlTime | 1571136142 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.druid.serde.DruidSerDe | NULL |
| InputFormat: | null | NULL |
| OutputFormat: | null | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | serialization.format | 1 |
+-------------------------------+----------------------------------------------------+----------------------------------------------------+
I did well or it's a Hive table with Druid parameters ? Someone can explain me more about Hive/Druid interactions ?
Thanks :D
Upvotes: 2
Views: 870
Reputation: 186
I think you registered your druid datasource in hive. Now you can run your queries using hive server on top of this table.
Your table definition look correct to me I think you managed to integrate druid datasoruce with hive. You can see druid related properties in table.
Now when you query the table it will use processing engine depending on the query it will use hive server along with druid. It can use combination of both or one of them on standalone basis to execute query. It depends whether that query can be converted to druid query or not.
You can refer to this doc for more info on Hive/Druid interactions : https://cwiki.apache.org/confluence/display/Hive/Druid+Integration (refer:Querying Druid from Hive)
Upvotes: 2