Reputation: 313
In Hive, how do we search a table by name in all databases?
I am a Teradata user. Is there any counterpart of systems tables (present in Teradata) like dbc.tables, dbc.columns which are present in HIVE?
Upvotes: 11
Views: 51275
Reputation: 21
That's an extention of Mantej Singh's answer: you can use pyspark to find tables across all Hive databases (not just one):
from functools import reduce
from pyspark import SparkContext, HiveContext
from pyspark.sql import DataFrame
sc = SparkContext()
sqlContext = HiveContext(sc)
dbnames = [row.databaseName for row in sqlContext.sql('SHOW DATABASES').collect()]
tnames = []
for dbname in dbnames:
tnames.append(sqlContext.sql('SHOW TABLES IN {} LIKE "%your_pattern%"'.format(dbname)))
tables = reduce(DataFrame.union, tnames)
tables.show()
The way to do this is to iterate through the databases searching for table with a specified name.
Upvotes: 2
Reputation: 117
Searching for tables with name containing infob across all Hive databases
for i in `hive -e "show schemas"`; do echo "Hive DB: $i"; hive -e "use $i; show tables"|grep "infob"; done
Upvotes: 3
Reputation: 410
You can use SQL like to search a table. Example: I want to search a table with the name starting from "Benchmark" I don't know the rest of it.
Input in HIVE CLI:
show tables like 'ben*'
Output:
+-----------------------+--+
| tab_name |
+-----------------------+--+
| benchmark_core_month |
| benchmark_core_qtr |
| benchmark_core_year |
+-----------------------+--+
3 rows selected (0.224 seconds)
Or you can try below command if you are using Beeline
!tables
Note: It will work with Beeline only (JDBC client based)
More about beeline: http://blog.cloudera.com/blog/2014/02/migrating-from-hive-cli-to-beeline-a-primer/
Upvotes: 12
Reputation: 2516
@hisi's answer is elegant. However it induce an error with lacking memory for GC on our cluster. So, there is another less elegant approach that works for me.
Let foo
is the table name to search. So
hadoop fs -ls -R -C /apps/hive/warehouse/ 2>/dev/null | grep '/apps/hive/warehouse/[^/]\{1,\}/foo$'
If one does not remember exact name of table but only substring bar
in table name, then command is
hadoop fs -ls -R -C /apps/hive/warehouse/ 2>/dev/null | grep '/apps/hive/warehouse/[^/]\{1,\}/[^/]\{1,\}$' | grep bar
Upvotes: 1
Reputation: 4797
you can also use hdfs to find a table in all databases:
the path of hive databases is:
/apps/hive/warehouse/
so, by using hdfs :
hdfs dfs -find /apps/hive/warehouse/ -name t*
Upvotes: 5
Reputation: 56
Hive stores all its metadata information in Metastore. Metastore schema can be found at: link: https://issues.apache.org/jira/secure/attachment/12471108/HiveMetaStore.pdf
It has tables like DBS for database, TBLS for tables and Columns. You may use appropriate join, to find out table name or column names.
Upvotes: 2
Reputation: 44991
You should query the metastore.
You can find the connection properties within hive-site.xml
bash
<$HIVE_HOME/conf/hive-site.xml grep -A1 jdo
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
--
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
--
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
--
<name>javax.jdo.option.ConnectionPassword</name>
<value>cloudera</value>
Within the metastore you can use a query similar to the following
mysql
select *
from metastore.DBS as d
join metastore.TBLS as t
on t.DB_ID =
d.DB_ID
where t.TBL_NAME like '% ... put somthing here ... %'
order by d.NAME
,t.TBL_NAME
;
Upvotes: 3