Reputation: 196
I currently have some HIVE code that creates a table where I declare all the column names and type. Then I use "load data inpath" to load a CSV into that table so I can then join to my database tables. The problem lies in that sometimes the CSV columns maybe in a different order. I can not control that as it is sent to me from a different source. I am wondering if there is a way to create the temp table I do daily without declaring the column names and just allow it to read from the CSV. This would allow me to not have to manually review the file every morning to check the columns are in the right order?
Upvotes: 1
Views: 783
Reputation: 49260
Because the column order keeps changing, as a first step you could use a shell script to read the header columns and generate a create table
script for the temp table. Thereafter,execute the generated create table string and load
the file into the temp table. From temp table, you can then load it to the target table.
A sample bash script to give you an idea.
#!/bin/bash
#File check
if [ -s "$1" ]
then echo "File $1 exists and is non-empty"
else
echo "File $1 does not exist or is empty"
fi
create_tbl_cmd="create table tblname ("
#Get fields from file header
fields=`head -1 $1`
#add string as the datatype for each field
fields="${fields//,/ string,} string)"
create_table_cmd="$create_tbl_cmd$fields"
#echo $create_table_cmd
#Execute the $create_table_cmd using Beeline or Hive command line invoking the necessary command
#Execute the Load into temp table
#Execute the Insert from temp table to target table
Execute the bash script above with the csv file argument as
bash scriptname.sh filename.csv
Upvotes: 3