Rakesh Shrama
Rakesh Shrama

Reputation: 259

Replace quotes using lazy simple serde hive

Hi I am dealing with many files which has quotes in the data as shown below.

"ID"|"STUDENT"|"GRADE" "123"|"John"|"9.7" "132"|"Johny"|"8.7" "143"|"Ronny"|"8.17"

I would like to remove quotes from data can you please let me know how it can be done. If at all using any built in serdes will be helpfull. Since I am dealing with many such file.

Upvotes: 0

Views: 5497

Answers (1)

K S Nidhin
K S Nidhin

Reputation: 2650

Load this data as such into a temp hive table . Then use regex_replace() function while inserting into your table.

steps :

  1. load data into a temp table with similar schema.
  2. Insert overwrite into the final table with regex_replace().

    insert overwrite table select    regexp_replace(COLUMN_NAME_1,"\"",""),regexp_replace(COLUMN_NAME_2,"\"","") from temp_hive_table;
    

Updated :

For many files.

  1. Define the temp table as an external table.
  2. Copy all your source files to this hdfs path.
  3. Do insert overwrite with regex_replace() into the desired table.

Hope this approach helps.

Upvotes: 1

Related Questions