Reputation: 39
I made a simple bash skript, which is supposed to count columns in every table in a specified database. The problem is, that the part
`hive -S -e 'use '$database'; describe '$word'' | grep -e '^[a-z]' | uniq |wc -l >>MY_PROJECTS/ColumnCountList.txt`
consumes a lot of time because of hive connection - everytime it counts columns in every next table, it connects to hive again.
Is there any way to connect to Hive once, and then just execute commands ? It would be much faster especially when im doing it for db with 1k tables. Code below:
#!/bin/bash
database=pos
tables_list=`hive -S -e 'use '$database'; show tables' | grep -e '^[a-z]' | uniq`
for word in $tables_list
do
`echo -n "$word : ">>MY_PROJECTS/ColumnCountList.txt`
`hive -S -e 'use '$database'; describe '$word'' | grep -e '^[a-z]' | uniq |wc -l >>MY_PROJECTS/ColumnCountList.txt`
done
Upvotes: 0
Views: 5256
Reputation: 3956
If you are using mysql as metastore database, this mysql query can give the count per table. If you want filter on database name then you have to add join to DBs.
select t.TBL_NAME, count(1) from TBLS t join COLUMNS_V2 c
on t.TBL_ID = c.CD_ID
group by t.TBL_NAME;
Upvotes: 0
Reputation: 1628
Build a file ( here called desc_tbl_list.sql
) using your bash script that contains the describe command and table name, like so:
describe tablename1
describe tablename2
...
Then run the script using hive command containing all the describe commands like so:
hive -f desc_tbl_list.sql
HTH
Upvotes: 1