NEO
NEO

Reputation: 399

HDFS -extra data after last expected column

We have source and target system. trying to import the data from SQL server 2012 to Pivotal Hadoop (PHD 3.0) version using talend tool.

Getting error:

ERROR: extra data after last expected column  (seg0 slice1 datanode.domain.com:40000 pid=15035)
  Detail: External table pick_report_stg0, line 5472 of pxf://masternnode/path/to/hdfs?profile=HdfsTextSimple: "5472;2016-11-28 08:39:54.217;;2016-11-15 00:00:00.0;SAMPLES;0005525;MORGAN -EVENTS;254056;1;IHBL-NHO..."

We tried

We have identified the BAD line as [hdfs@mdw ~]$ hdfs dfs -cat /path/to/hdfs|grep 3548

3548;2016-11-28 04:21:39.97;;2016-11-15 00:00:00.0;SAMPLES;0005525;MORGAN -EVENTS;254056;1;IHBL-NHO-13OZ-01;0;ROC NATION; NH;2016-11-15 00:00:00.0;2016-11-15 00:00:00.0;;2.0;11.99;SA;SC01;NH02;EA;1;F2;NEW PKG ONLY PLEASE!! BY NOON

Structure of External table and Format clause

CREATE EXTERNAL TABLE schemaname.tablename
(
"ID" bigint,
  "time" timestamp without time zone,
  "ShipAddress4" character(40),
  "EntrySystemDate" timestamp without time zone,
  "CorpAcctName" character(40),
  "Customer" character(7),
  "CustomerName" character(30),
  "SalesOrder" character(6),
  "OrderStatus" character(1),
  "MStockCode" character(30),
  "ShipPostalCode" character(9),
  "CustomerPoNumber" character(30),
  "OrderDate" timestamp without time zone,
  "ReqShipDate" timestamp without time zone,
  "DateValue" timestamp without time zone,
  "MOrderQty" numeric(9,0),
  "MPrice" numeric(9,0),
  "CustomerClass" character(2),
  "ProductClass" character(4),
  "ProductGroup" character(10),
  "StockUom" character(3),
  "DispatchCount" integer,
  "MWarehouse" character(2),
  "AlphaValue" varchar(100)
)
 LOCATION (
    'pxf://path/to/hdfs?profile=HdfsTextSimple'
)
 FORMAT 'csv' (delimiter ';' null '' quote ';')
ENCODING 'UTF8';

Finding : Extra semi colon appeared which causes extra data. But I am still unable to supply correct format clause . Please guide How do I remove extra data column error.

What format clause should I use.

Any help on it would be much Appreciated !

Upvotes: 0

Views: 234

Answers (1)

Mike
Mike

Reputation: 41

If you append the following to your external table definition, after the ENCODING clause, it should help to resolve the issue where a small number of rows fail due to this issue:

LOG ERRORS INTO my_err_table SEGMENT REJECT LIMIT 1 PERCENT;

Here is a reference on this syntax: http://gpdb.docs.pivotal.io/4320/ref_guide/sql_commands/CREATE_EXTERNAL_TABLE.html

Upvotes: 2

Related Questions