hiphop
hiphop

Reputation: 81

Snowflake - Escape backslash and double quotes

I have a data file in a BLOB and I am trying to create an external table from it. Here's how the file looks in a text editor:

"ABC"|"123"|"ID1"
"XYZ"|"456"|"ID2"

When I create an external table with a pipe as the delimiter and then query the individual columns, I see these values:

value:c1|value:c2|value:c3
"\"ABC\"" "\"123\"" "\"ID1\""
"\"XYZ\"" "\"345\"" "\"ID2\""

Is there something that I need to add to the field_optionally_enclosed_by option? I am easily able to do

select replace(value:c1,'\\"'),replace(value:c2,'\\"'),replace(value:c3,'\\"') from testable

and this gives me back

value:c1|value:c2|value:c3
"ABC" "123" "ID1"
"XYZ" "345" "ID2"

Any suggestions?

Upvotes: 2

Views: 5060

Answers (2)

Eric Lin
Eric Lin

Reputation: 1510

Please try below and see if it helps, it works in my test:

create or replace external table external_table 
 with location = @test.ericlin_s3_stage/t1/
 auto_refresh = true
 REFRESH_ON_CREATE = true
 FILE_FORMAT = (
    TYPE=CSV 
    COMPRESSION=NONE 
    REPLACE_INVALID_CHARACTERS = TRUE 
    FIELD_DELIMITER = '|' 
    FIELD_OPTIONALLY_ENCLOSED_BY = '"'
    ESCAPE_UNENCLOSED_FIELD = '\\'
  );
select 
    value:c1, value:c2, value:c3 
from external_table;

+----------+----------+----------+
| VALUE:C1 | VALUE:C2 | VALUE:C3 |
|----------+----------+----------|
| "ABC"    | "123"    | "ID1"    |
| "XYZ"    | "456"    | "ID2"    |
+----------+----------+----------+


select 
    value:c1::string as c1, 
    value:c2::string as c2, 
    value:c3::string as c3
from external_table;

+-----+-----+-----+
| C1  | C2  | C3  |
|-----+-----+-----|
| ABC | 123 | ID1 |
| XYZ | 456 | ID2 |
+-----+-----+-----+

Upvotes: 1

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25903

If your data is always fixed width, and you are want independent values for writting to the external table, then SPLIT and a nested select could help here:

SELECT
    get(s,0) as a,
    get(s,1) as b,
    get(s,2) as c
FROM (
    SELECT split(column1,'|') as s
    FROM VALUES
        ('"ABC"|"123"|"ID1"'),
        ('"XYZ"|"456"|"ID2"')
);

gives:

A B C
""ABC"" ""123"" ""ID1""
""XYZ"" ""456"" ""ID2""

this look funning because the output layer has escaped the inner double quotes. So you could use TRIM to drop those:

SELECT 
    trim(get(s,0),'"') as a,
    trim(get(s,1),'"') as b,
    trim(get(s,2),'"') as c
FROM (
    SELECT split(column1,'|') as s
    FROM VALUES
        ('"ABC"|"123"|"ID1"'),
        ('"XYZ"|"456"|"ID2"')
);
A B C
ABC 123 ID1
XYZ 456 ID2

and if you dislike the sub-select you can use SPLIT_PART

SELECT 
    trim(split_part(column1,'|',0),'"') as a,
    trim(split_part(column1,'|',1),'"') as b,
    trim(split_part(column1,'|',2),'"') as c
FROM VALUES
    ('"ABC"|"123"|"ID1"'),
    ('"XYZ"|"456"|"ID2"')
;

Upvotes: 0

Related Questions