Reputation: 764
I have a table in Athena with three columns created with below template
Resources:
TempDataMasterTable:
Type: AWS::Glue::Table
Properties:
DatabaseName: "temp_db"
CatalogId: !Ref AWS::AccountId
TableInput:
Name: "temp_table"
Description: "Master table"
TableType: EXTERNAL_TABLE
Parameters: { "classification" : "csv", "compressionType" : "none", "typeOfData": "file" }
StorageDescriptor:
Location: s3://temp-location/temp-folder/
InputFormat: 'org.apache.hadoop.mapred.TextInputFormat'
OutputFormat: 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
SerdeInfo:
Parameters:
serialization.format: ','
field.delim: ','
SerializationLibrary: 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
Columns:
- {"Name": "name", "Type": "string"}
- {"Name": "address", "Type": "string"}
- {"Name": "desig", "Type": "string"}
So basically columns are delimited by ',' but in my column 2 - address it is possible to have ',' for e.g one of the values in underlying csv has "housenumber,street,city" when we view the data in Athena the table is not being parsed correctly as street in the column 2 is being rendered as third column because of ',' being the delimiter.
Is there a way to escape the ',' which are given in quotes ?
Upvotes: 1
Views: 3124
Reputation: 5124
Replace serde org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
with org.apache.hadoop.hive.serde2.OpenCSVSerde
. LazySimpleSerDe
cannot well when there are commas present in side a column where as OpenCSVSerde
can.
Upvotes: 2