Reputation: 399
We have source and target system. trying to import the data from SQL server 2012 to Pivotal Hadoop (PHD 3.0) version using talend tool.
Getting error:
ERROR: extra data after last expected column (seg0 slice1 datanode.domain.com:40000 pid=15035)
Detail: External table pick_report_stg0, line 5472 of pxf://masternnode/path/to/hdfs?profile=HdfsTextSimple: "5472;2016-11-28 08:39:54.217;;2016-11-15 00:00:00.0;SAMPLES;0005525;MORGAN -EVENTS;254056;1;IHBL-NHO..."
We tried
We have identified the BAD line as [hdfs@mdw ~]$ hdfs dfs -cat /path/to/hdfs|grep 3548
3548;2016-11-28 04:21:39.97;;2016-11-15 00:00:00.0;SAMPLES;0005525;MORGAN -EVENTS;254056;1;IHBL-NHO-13OZ-01;0;ROC NATION; NH;2016-11-15 00:00:00.0;2016-11-15 00:00:00.0;;2.0;11.99;SA;SC01;NH02;EA;1;F2;NEW PKG ONLY PLEASE!! BY NOON
Structure of External table and Format clause
CREATE EXTERNAL TABLE schemaname.tablename
(
"ID" bigint,
"time" timestamp without time zone,
"ShipAddress4" character(40),
"EntrySystemDate" timestamp without time zone,
"CorpAcctName" character(40),
"Customer" character(7),
"CustomerName" character(30),
"SalesOrder" character(6),
"OrderStatus" character(1),
"MStockCode" character(30),
"ShipPostalCode" character(9),
"CustomerPoNumber" character(30),
"OrderDate" timestamp without time zone,
"ReqShipDate" timestamp without time zone,
"DateValue" timestamp without time zone,
"MOrderQty" numeric(9,0),
"MPrice" numeric(9,0),
"CustomerClass" character(2),
"ProductClass" character(4),
"ProductGroup" character(10),
"StockUom" character(3),
"DispatchCount" integer,
"MWarehouse" character(2),
"AlphaValue" varchar(100)
)
LOCATION (
'pxf://path/to/hdfs?profile=HdfsTextSimple'
)
FORMAT 'csv' (delimiter ';' null '' quote ';')
ENCODING 'UTF8';
Finding : Extra semi colon appeared which causes extra data. But I am still unable to supply correct format clause . Please guide How do I remove extra data column error.
What format clause should I use.
Any help on it would be much Appreciated !
Upvotes: 0
Views: 234
Reputation: 41
If you append the following to your external table definition, after the ENCODING clause, it should help to resolve the issue where a small number of rows fail due to this issue:
LOG ERRORS INTO my_err_table SEGMENT REJECT LIMIT 1 PERCENT;
Here is a reference on this syntax: http://gpdb.docs.pivotal.io/4320/ref_guide/sql_commands/CREATE_EXTERNAL_TABLE.html
Upvotes: 2