ibh
ibh

Reputation: 95

Automated List of Hive External tables

I have to create an automated process to list all external tables in Hive and do a record count on those tables.

I should do this as a daily job. I tried this by hard coding all the external table names, but this is not accepted as the tables keep on changing once in a month.

I have gone through different approaches like [show tables] and executing query in metastore DB. But these will not help me in automating the process.

Is there a better approach to implement this in Hive.

Upvotes: 1

Views: 1346

Answers (3)

Richard Zuidhof
Richard Zuidhof

Reputation: 11

I liked the method of interacting with the Hive Metastore database in MySQL. You can wrap this in a shell script.

select d.NAME,t.TBL_NAME,t.TBL_TYPE,s.LOCATION from TBLS t JOIN DBS d ON t.DB_ID=d.DB_ID JOIN SDS s ON t.SD_ID=s.SD_ID where t.TBL_TYPE='EXTERNAL_TABLE';

Upvotes: 0

maheshchimmiri
maheshchimmiri

Reputation: 11

As per my understanding first your requirement is to find out list of Hive external tables in all hive databases .

Database name,Table name,Table Type(External) and HDFS Location of hive External tables.

Login to hive metastore and use Hive meta database.

Use 3 tables TBLS,DBS and SDS tables , on top of this 3 tables we can apply joins on DB_ID and SD_ID

By using above format you can get databases names along with respected hive external tables list and HDFS path location also .

For query and output information please read this link

https://askdoubts.com/question/how-to-find-out-list-of-all-hive-external-tables-and-hdfs-paths-from-hive-metastore/#comment-19

Upvotes: 0

leftjoin
leftjoin

Reputation: 38335

Something like this, using shell.

#Create external table list for a schema
SCHEMA=your_schema_name 

#define filenames   
alltableslist=tables_$SCHEMA
exttablelist=ext_tables_$SCHEMA

#Get all tables
 hive -S -e " set hive.cli.print.header=false; use $SCHEMA; show tables;" 1> $alltableslist


#For each table check its type:
for table in $(cat $alltableslist)
 do 

 echo Processing table $table ...

     #Describe table
     describe=$(hive client -S -e "use $SCHEMA; DESCRIBE FORMATTED $table")

     #Get type
     table_type=$(echo "${describe}" | egrep -o 'Table Type:[^,]+' | cut -f2)

     #Check table type, get count and write table name with count
      if [ $table_type == EXTERNAL_TABLE ]; then 
         #get count
          cnt=$(hive client -S -e "select count(*) from $SCHEMA.table ")
         #save result
          echo "$table $cnt" > $exttablelist 
      fi

done; #tables loop

Just replace your_schema_name at the beginning with your schema name. External tables with counts in this example will be saved in the file ext_tables_[your_schema_name]

Counts can be processed in parallel and even in single SQL statement and many other things can be improved, but hope you have caught the idea.

Upvotes: 1

Related Questions