Alekhya Vemavarapu
Alekhya Vemavarapu

Reputation: 1155

Dynamically load different XML files to Hive tables

I have a folder/stream of different complex XML files (each of size ~ 1GB). I know how to load an XMLfile data to Hive table (or any Hadoop data base).
But I want to know two things:

  1. Can I load each xml file data to hive dynamically, i.e. without explicitly writing a create table command (because I get different XML files as a stream), is there any way which does this automatically.

"Stream of different complex xml Files --> Load to Hive tables (with out manually writing Create table command) --> Use the data which is loaded into Hive tables"

  1. Instead of writing command line scripts to create hive tables, How can I write a java code to load xml data to Hive table.

Upvotes: 0

Views: 639

Answers (1)

frb
frb

Reputation: 3798

Regarding your first question, AFAIK, it is not possible. Hive is intended to manage data that is stored within Hive tables (it is not always stored within tables, but metadata is added to the tables, pointing to the real data, that's the case of Hive external tables).

The only thing I think you can try is to create a single big table for all the data within your XML files, the already stored ones and the future ones; the trick is to put all the XML files under a common HDFS folder that it is used as the location of the create table command.

Regarding your second question, please refer to this code:

public final class HiveBasicClient {

    private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
    private static Connection con;

    private static Connection getConnection(String hiveServer, String hivePort, String hadoopUser, String hadoopPassword) {
        try {
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            return null;
        }

        try {
            return DriverManager.getConnection("jdbc:hive://" + hiveServer + ":" + hivePort + "/default?user=" + hadoopUser + "&password=" + hadoopPassword);
        } catch (SQLException e) {
            return null;
        } 
    }

    private static res doQuery(String query) {
        try {
            Statement stmt = con.createStatement();
            ResultSet res = stmt.executeQuery(query);
            res.close();
            stmt.close();
            return res;
        } catch (SQLException ex) {
            System.exit(0);
        }
    }

    public static void main(String[] args) {
        String hiveServer = args[0];
        String hivePort = args[1];
        String hadoopUser = args[2];
        String hadoopPassword = args[3];

        con = getConnection(hiveServer, hivePort, hadoopUser, hadoopPassword);
        doQuery("create external table <table_name> (<list_of_columns>) row format serde '<your_xml_serde>' location `<your_xml_files_location>');
    }

}

Hope it helps.

Upvotes: 1

Related Questions