Shrikant
Shrikant

Reputation: 803

Handling conditional file in Hadoop

I am stuck and am not able to find any solution so far . I need to convert fixed length file into control A delimited but the file pattern is such that based on one of the atrribute i.e record_type the schema changes for that particular record.I dont know who can I convert this fixed file into delimited file.

Sample Record :

NCBDX**DD**00C98             0002016-01-0402.30.33013000895527        821064      07.30.332016-01-0400895527        000000
NCBDX**DT**00C98           5108050000007851   000821064                                 0R 

Abinitio DML :-
record  //RECORD-START
    ascii string(1) RECORD_PREFIX;
    ascii string(4) TRANSMISSION_PROCESS;
    ascii string(2) RECORD_TYPE;
    ascii string(5) HEADER_INSTITUTION_ID;
    ascii string(11) HEADER_PREFIX_NUMBER;
if (record_type=="DT") //Changed single quotes to double quotes - Sathish Ethiraj
  record  //digital_transaction_rec
        ascii string("\001") dt_cardholder_account_number=NULL("");
        ascii decimal("\001") dt_member_number=NULL("");
        ascii string("\001") dt_terminal_sequence_number=NULL("");
        ascii string("\001") dt_tokenization_message_type=NULL("");
        ascii string("\001") dt_payment_token=NULL("");
        ascii string("\001") dt_token_expiration_date=NULL("");
        ascii string("\001") dt_account_number_indicator=NULL("");
        ascii string("\001") dt_tokenization_event_indicator=NULL("");
        ascii string("\001") dt_transaction_status_indicator=NULL("");
        ascii string("\001") dt_transaction_category_code=NULL("");
        ascii string("\001") dt_payment_initiation_channel=NULL("");
        ascii string("\001") dt_wallet_program_indicator=NULL("");
        ascii string("\001") dt_on_behalf_service_1=NULL("");
        ascii string("\001") dt_on_behalf_service_2=NULL("");
        ascii string("\001") dt_on_behalf_result_1=NULL("");
        ascii string("\001") dt_on_behalf_result_2=NULL("");
        ascii string("\001") dt_primary_account_number_source=NULL("");
        ascii string("\001") dt_payment_appn_instance_id=NULL("");
if (record_type=="DD")
  record // DOLLAR-LOG-REC
        ascii string("\001") dd_hdr_instun_id =NULL("");
        ascii string("\001") dd_hdr_prfx_num =NULL("");
        ascii string("\001") dd_cdhldr_acct_num =NULL("");
        ascii string("\001") dd_mbr_num =NULL("");
        ascii string("\001") dd_mtv_trxn_dt =NULL("");
        ascii string("\001") dd_mtv_trxn_tm =NULL("");
        ascii string("\001") dd_trxn_rqst_type_cd =NULL("");
        ascii string("\001") dd_trmnl_num =NULL("");
        ascii string("\001") dd_trmnl_seq_num =NULL("");

I tried using substr function and load into HIVE , but am not able to put conditions . Any help in this regard will be very helpful.

Upvotes: 0

Views: 79

Answers (1)

Alex Libov
Alex Libov

Reputation: 1491

A table in Hive must have a defined schema which can't change depending on input... Maybe you can split your input file into two files - record_type=="DT" and record_type=="DD", then load each of them into a different table.

Upvotes: 2

Related Questions