Reputation: 20090
We are importing data on Salesforce through Talend and we have multiple items with the same internal id.
Such import fails with error "Duplicate external id specified" because of how upsert works in Salesforce. At the moment, we worked that around by using the commit size of the tSalesforceOutput to 1, but that works only for small amount of data or it would exhaust Salesforce API Limits.
Is there a known approach to it in Talend? For example, to ensure items that have same external ID ends up in different "commits" of tSalesforceOutput?
Upvotes: 0
Views: 1001
Reputation: 801
Here is the design for the solution I wish to propose:
(Boolean)globalMap.get("finish") == false
as an end condition.if (((Integer)globalMap.get("tUniqRow_1_NB_DUPLICATES")) == 0) globalMap.put("finish", true);
Explaination with the following sample data:
line 1
line 2
line 3
line 2
line 4
line 2
line 5
line 3
On the 1st iteration, 5 uniques records are pushed into tLogRow, 3 duplicates are pushed into file A and "finish" is not changed as there is duplicates.
On the 2nd iteration, operations are repeated for 2 uniques records and 1 duplicate.
On the 3rd iteration, operations are repeated for 1 unique and as there not anymore duplicate, "finish" is set to true and the loop automatically finishes.
You can also decide to use an other global variable to set the salesforce commit level (using the syntax (Integer)globalMap.get("commitLevel")
). This variable will be set to 200 by default and to 1 in the tJava if any duplicates. At the same time, set "finish" to true (without testing the number of duplicates) and you'll have a commit level to 200 for the 1st iteration and to 1 for the 2nd (and no need more than 2 iterations).
You'll decide the better choice depending on the number of potential duplicates, but you can notice that you can do it whitout any change to the job design.
I think it should solve your problem. Let me know.
Regards,
TRF
Upvotes: 2
Reputation: 801
Do you mean you have the same record (the same account for example) twice or more in the input?
If so, can't you try to eliminate the duplicates and keep only the record you need to push to Salesforce?
Else, if each record has specific informations (so you need all the input records to have a complete one in Salesforce), consider to merge the records before to push the result into Salesforce.
And finally, if you can't do that, push the doublons in a temporary space, push the records but the doublons into Salesforce and iterate other this process until there is no more doublons.
Personally, if you can't just eliminate the doublons, I prefer the 2nd approach as it's the solution to have less Salesforce API calls.
Hope this helps.
TRF
Upvotes: 1