Infinite
Infinite

Reputation: 764

AWS Athena - how to escape characters like ',' which are present in quotes

I have a table in Athena with three columns created with below template

Resources:
    TempDataMasterTable:
       Type: AWS::Glue::Table
       Properties:
         DatabaseName: "temp_db"
         CatalogId: !Ref AWS::AccountId
         TableInput:
           Name: "temp_table"
           Description: "Master table"
           TableType: EXTERNAL_TABLE
           Parameters: { "classification" : "csv", "compressionType" : "none", "typeOfData": "file" }
           StorageDescriptor:
             Location: s3://temp-location/temp-folder/
             InputFormat: 'org.apache.hadoop.mapred.TextInputFormat'
             OutputFormat: 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
             SerdeInfo:
               Parameters:
                 serialization.format: ','
                 field.delim: ','
                 SerializationLibrary: 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
             Columns:
             - {"Name": "name", "Type": "string"}
             - {"Name": "address", "Type": "string"}
             - {"Name": "desig", "Type": "string"}

So basically columns are delimited by ',' but in my column 2 - address it is possible to have ',' for e.g one of the values in underlying csv has "housenumber,street,city" when we view the data in Athena the table is not being parsed correctly as street in the column 2 is being rendered as third column because of ',' being the delimiter.

Is there a way to escape the ',' which are given in quotes ?

Upvotes: 1

Views: 3124

Answers (1)

Prabhakar Reddy
Prabhakar Reddy

Reputation: 5124

Replace serde org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe with org.apache.hadoop.hive.serde2.OpenCSVSerde. LazySimpleSerDe cannot well when there are commas present in side a column where as OpenCSVSerde can.

Upvotes: 2

Related Questions