Arpan
Arpan

Reputation: 993

Is it possible in PIG to create column field by defining column field value

Suppose i have below Structured data file

1298712012061228765236542123049824234209374 1203972012073042198531203948203498023498023 1203712012092329385612350924395798456892345 1234812012101223423498230482034893204820398

Here in above file first 6 digit are UserId from (1-6) next 8 digit are year_date from (7-12) column next 6 column is Count field from (13-18) , then similarly I have product_id from (19-30) and Character_values column from (31-42) for the above flat file, So I want my data in below format. I mean to say is that I want to load my data using this mentioned field. Is there any option available in PIG or HIVE for this?

enter image description here

Upvotes: 1

Views: 335

Answers (2)

Aman
Aman

Reputation: 3261

You can use it both in pig and hive. Two solutions are below
PIG:

data = LOAD '/data.txt' USING PigStorage() AS (line);
strsplit = FOREACH data GENERATE 
SUBSTRING(line,1,6) AS UserID,
SUBSTRING(line,7,12) AS year_date,
SUBSTRING(line,13,18) AS Count,
SUBSTRING(line,19,30) AS product_id,
SUBSTRING(line,31,42) AS Character_values;  

When you dump :
dump strsplit; (29871,29871,29871,29871,29871)
(20397,20397,20397,20397,20397)
(20371,20371,20371,20371,20371)
(23481,23481,23481,23481,23481)

HIVE:

Step1:create a temp table and load the original data;

create table temp(line String)
ROW FORMAT DELIMITED
LINES TERMINATED BY '\n';
LOAD DATA INPATH '/data.txt' INTO TABLE temp;  

Step2:create a table which will fit your data.

   create table user(UserID String,year_date String,Count String,product_id String,Character_values String)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'; 

Step3:Inster you temp table into actula table

INSERT INTO TABLE user
SELECT substr(line,0,6),substr(line,7,12),substr(line,13,18),substr(line,19,30),substr(line,31,42)FROM temp;

Upvotes: 3

maxymoo
maxymoo

Reputation: 36545

Can you use SUBSTRING?

A = LOAD 'DATA' USING PigStorage() AS (line); 
B = FOREACH A GENERATE SUBSTRING(line,1,6) AS UserID, SUBSTRING(line,7,12) AS Year_date ...

Upvotes: 2

Related Questions