Earjina Ahmed
Earjina Ahmed

Reputation: 11

In Hive what is the difference between FIELDS TERMINATED BY '\u0004' and FIELDS TERMINATED BY '\u001C'

In my project I saw two Hive tables and in the create table statement I saw one table has ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0004' and another table has ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001C'. I want to know what does these '\u0004' and '\u001C' mean and when to use them? Kindly answer.

Upvotes: 0

Views: 513

Answers (1)

Sam
Sam

Reputation: 9944

In many text formats, \u introduces a Unicode escape sequence. This is a way of storing or sending a character that can't be easily displayed or represented in the format you're using. The four characters after the \u are the Unicode "code point" in hexadecimal. A Unicode code point is a number denoting a specific Unicode character.

All characters have a code point, even the printable ones. For example, a is U+0061.

U+0004 and U+001C are both unprintable characters, meaning there's no standard character you can use to display them on the screen. That's why an escape sequence is used here.

If you use a simple, printable character like , as your field delimiter, it will make the stored data easier for a human to read. The field values will be stored with a , between each one. For example, you might see the values one, two and three stored as:

one,two,three

But if you expect your field values to actually contain a ,, it would be a poor choice of field delimiter (because then you'd need a special way to tell the difference between a single field with a value of one,two or two different fields with the values one and two). The choice of delimiter depends both on whether you want to be able to read it easily, and what characters you expect the field to contain.

Upvotes: 0

Related Questions