Sotark
Sotark

Reputation: 196

Create table in HIVE using column names from file

I currently have some HIVE code that creates a table where I declare all the column names and type. Then I use "load data inpath" to load a CSV into that table so I can then join to my database tables. The problem lies in that sometimes the CSV columns maybe in a different order. I can not control that as it is sent to me from a different source. I am wondering if there is a way to create the temp table I do daily without declaring the column names and just allow it to read from the CSV. This would allow me to not have to manually review the file every morning to check the columns are in the right order?

Upvotes: 1

Views: 783

Answers (1)

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

Because the column order keeps changing, as a first step you could use a shell script to read the header columns and generate a create table script for the temp table. Thereafter,execute the generated create table string and load the file into the temp table. From temp table, you can then load it to the target table.

A sample bash script to give you an idea.

#!/bin/bash

#File check
if [ -s "$1" ]
then echo "File $1 exists and is non-empty"
else
echo "File $1 does not exist or is empty"
fi

create_tbl_cmd="create table tblname ("    
#Get fields from file header
fields=`head -1 $1`    
#add string as the datatype for each field
fields="${fields//,/ string,} string)"
create_table_cmd="$create_tbl_cmd$fields"
#echo $create_table_cmd

#Execute the $create_table_cmd using Beeline or Hive command line invoking the necessary command

#Execute the Load into temp table

#Execute the Insert from temp table to target table 

Execute the bash script above with the csv file argument as

bash scriptname.sh filename.csv

Upvotes: 3

Related Questions