Reputation: 1155
I have a folder/stream of different complex XML files (each of size ~ 1GB). I know how to load an XMLfile data to Hive table (or any Hadoop data base).
But I want to know two things:
"Stream of different complex xml Files --> Load to Hive tables (with out manually writing Create table command) --> Use the data which is loaded into Hive tables"
Upvotes: 0
Views: 639
Reputation: 3798
Regarding your first question, AFAIK, it is not possible. Hive is intended to manage data that is stored within Hive tables (it is not always stored within tables, but metadata is added to the tables, pointing to the real data, that's the case of Hive external tables).
The only thing I think you can try is to create a single big table for all the data within your XML files, the already stored ones and the future ones; the trick is to put all the XML files under a common HDFS folder that it is used as the location of the create table
command.
Regarding your second question, please refer to this code:
public final class HiveBasicClient {
private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
private static Connection con;
private static Connection getConnection(String hiveServer, String hivePort, String hadoopUser, String hadoopPassword) {
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
return null;
}
try {
return DriverManager.getConnection("jdbc:hive://" + hiveServer + ":" + hivePort + "/default?user=" + hadoopUser + "&password=" + hadoopPassword);
} catch (SQLException e) {
return null;
}
}
private static res doQuery(String query) {
try {
Statement stmt = con.createStatement();
ResultSet res = stmt.executeQuery(query);
res.close();
stmt.close();
return res;
} catch (SQLException ex) {
System.exit(0);
}
}
public static void main(String[] args) {
String hiveServer = args[0];
String hivePort = args[1];
String hadoopUser = args[2];
String hadoopPassword = args[3];
con = getConnection(hiveServer, hivePort, hadoopUser, hadoopPassword);
doQuery("create external table <table_name> (<list_of_columns>) row format serde '<your_xml_serde>' location `<your_xml_files_location>');
}
}
Hope it helps.
Upvotes: 1