Hoa Le
Hoa Le

Reputation: 7

How to import data from folder to Hive with new columns as file's name and folder's name?

I have data input like this:

Structure of csv file is:

x,y
0.0,0.0
18.6,-11.1
36.1,-21.9
53.7,-32.6
70.1,-42.8
86.5,-52.6

I want to load all file in this folder to Hive table like:

id, x, y, file_name, folder_name
1, 0.0, 0.0, 1.csv, driver_1
...

How can I do it? Can anyone help me please?

Upvotes: 0

Views: 496

Answers (1)

Jeremy Beard
Jeremy Beard

Reputation: 2725

Hive has a virtual column named INPUT__FILE__NAME that contains the full path to the input file that contained the record. Then using REGEXP_EXTRACT we can extract out the parent directory and the filename:

SELECT
    x
  , y
  , REGEXP_EXTRACT(INPUT__FILE__NAME, '.*/(.*)/(.*)', 2) AS file_name
  , REGEXP_EXTRACT(INPUT__FILE__NAME, '.*/(.*)/(.*)', 1) AS folder_name
FROM
    table
;

Upvotes: 2

Related Questions