Reputation: 475
I imported data in Hadoop using Sqoop 1.4.6. Sqoop imports and saves the data in HDFS in an extensionless file but in csv format. I used Apache Drill to query the data from this file but got Table not found error. In Storage Plugin configuration, I even put null, blank (""), space (" ") in extensions but was not able to query the file. Even I was able to query the file when I changed the filename with an extension. Putting any extension in the configuration file works other than null extension. I could query the file saved in csv format but with extension 'mat' or anything.
Is there any way to query the extensionless files?
Upvotes: 1
Views: 835
Reputation: 387
I have the same experience. First, I imported 1 table from oracle to hadoop 2.7.1 then query via drill. This is my plugin config set through web UI:
{
"type": "file",
"enabled": true,
"connection": "hdfs://192.168.19.128:8020",
"workspaces": {
"hdf": {
"location": "/user/hdf/my_data/",
"writable": false,
"defaultInputFormat": "csv"
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
}
},
"formats": {
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
}
}
}
then, in drill cli, query like this:
USE hdfs.hdf
SELECT * FROM part-m-00000
Also, in hadoop file system, when I cat the content of 'part-m-00000', the below format printed on the console:
2015-11-07 17:45:40.0,6,8
2014-10-02 12:25:20.0,10,1
Upvotes: 0
Reputation: 2283
You can use a default input format in the storage plugin configuration to solve this problem. For example:
select * from dfs.`/Users/khahn/Downloads/csv_line_delimit.csv`;
+-------------------------+
| columns |
+-------------------------+
| ["hello","1","2","3!"] |
. . .
Change the file name to remove the extension and modify the plugin config "location" and "defaultInputFormat":
{
"type": "file",
"enabled": true,
"connection": "file:///",
"workspaces": {
"root": {
"location": "/Users/khahn/Downloads",
"writable": false,
"defaultInputFormat": "csv"
},
Query the file that has no extension.
0: jdbc:drill:zk=local> select * from dfs.root.`csv_line_delimit`;
+-------------------------+
| columns |
+-------------------------+
| ["hello","1","2","3!"] |
. . .
Upvotes: 2