Harsh Choudhary
Harsh Choudhary

Reputation: 475

Query Extensionless File using Apache Drill

I imported data in Hadoop using Sqoop 1.4.6. Sqoop imports and saves the data in HDFS in an extensionless file but in csv format. I used Apache Drill to query the data from this file but got Table not found error. In Storage Plugin configuration, I even put null, blank (""), space (" ") in extensions but was not able to query the file. Even I was able to query the file when I changed the filename with an extension. Putting any extension in the configuration file works other than null extension. I could query the file saved in csv format but with extension 'mat' or anything.

Is there any way to query the extensionless files?

Upvotes: 1

Views: 835

Answers (2)

ArefehTam
ArefehTam

Reputation: 387

I have the same experience. First, I imported 1 table from oracle to hadoop 2.7.1 then query via drill. This is my plugin config set through web UI:

{
  "type": "file",
  "enabled": true,
  "connection": "hdfs://192.168.19.128:8020",
  "workspaces": {
    "hdf": {
      "location": "/user/hdf/my_data/",
      "writable": false,
      "defaultInputFormat": "csv"
    },
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null
    }
  },
  "formats": {
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","
    }
  }
}

then, in drill cli, query like this:

USE hdfs.hdf
SELECT * FROM part-m-00000

Also, in hadoop file system, when I cat the content of 'part-m-00000', the below format printed on the console:

2015-11-07 17:45:40.0,6,8
2014-10-02 12:25:20.0,10,1

Upvotes: 0

catpaws
catpaws

Reputation: 2283

You can use a default input format in the storage plugin configuration to solve this problem. For example:

select * from dfs.`/Users/khahn/Downloads/csv_line_delimit.csv`;
+-------------------------+
|         columns         |
+-------------------------+
| ["hello","1","2","3!"]  |
 . . .

Change the file name to remove the extension and modify the plugin config "location" and "defaultInputFormat":

{
  "type": "file",
  "enabled": true,
  "connection": "file:///",
  "workspaces": {
    "root": {
      "location": "/Users/khahn/Downloads",
      "writable": false,
      "defaultInputFormat": "csv"
    },

Query the file that has no extension.

0: jdbc:drill:zk=local> select * from dfs.root.`csv_line_delimit`;
+-------------------------+
|         columns         |
+-------------------------+
| ["hello","1","2","3!"]  |
. . .

Upvotes: 2

Related Questions