Edmondo
Edmondo

Reputation: 20090

Talend avoid duplicate external ID with Salesforce Output

We are importing data on Salesforce through Talend and we have multiple items with the same internal id.

Such import fails with error "Duplicate external id specified" because of how upsert works in Salesforce. At the moment, we worked that around by using the commit size of the tSalesforceOutput to 1, but that works only for small amount of data or it would exhaust Salesforce API Limits.

Is there a known approach to it in Talend? For example, to ensure items that have same external ID ends up in different "commits" of tSalesforceOutput?

Upvotes: 0

Views: 1001

Answers (2)

TRF
TRF

Reputation: 801

Here is the design for the solution I wish to propose: enter image description here

  • tSetGlobalVar is here to initialize the variable "finish" to false.
  • tLoop starts a while loop with (Boolean)globalMap.get("finish") == false as an end condition.
  • tFileCopy is used to copy the initial file (A for example) to a new one (B).
  • tFileInputDelimited reads file B.
  • tUniqRow eliminates duplicates. Uniques records go to tLogRow you have to replace by tSalesforceOutput. Duplicates records if any go to tFileOutputDelimited called A (same name as the original file) with the option "Throw an error if the file already exist" unchecked.
  • OnComponent OK after tUniqRow activates the tJava which set the new value for the global finish with the following code:
    if (((Integer)globalMap.get("tUniqRow_1_NB_DUPLICATES")) == 0) globalMap.put("finish", true);

Explaination with the following sample data:
line 1
line 2
line 3
line 2
line 4
line 2
line 5
line 3

On the 1st iteration, 5 uniques records are pushed into tLogRow, 3 duplicates are pushed into file A and "finish" is not changed as there is duplicates.
On the 2nd iteration, operations are repeated for 2 uniques records and 1 duplicate.
On the 3rd iteration, operations are repeated for 1 unique and as there not anymore duplicate, "finish" is set to true and the loop automatically finishes.

Here is the final result:
enter image description here

You can also decide to use an other global variable to set the salesforce commit level (using the syntax (Integer)globalMap.get("commitLevel")). This variable will be set to 200 by default and to 1 in the tJava if any duplicates. At the same time, set "finish" to true (without testing the number of duplicates) and you'll have a commit level to 200 for the 1st iteration and to 1 for the 2nd (and no need more than 2 iterations).
You'll decide the better choice depending on the number of potential duplicates, but you can notice that you can do it whitout any change to the job design.

I think it should solve your problem. Let me know.

Regards,
TRF

Upvotes: 2

TRF
TRF

Reputation: 801

Do you mean you have the same record (the same account for example) twice or more in the input?
If so, can't you try to eliminate the duplicates and keep only the record you need to push to Salesforce?
Else, if each record has specific informations (so you need all the input records to have a complete one in Salesforce), consider to merge the records before to push the result into Salesforce.
And finally, if you can't do that, push the doublons in a temporary space, push the records but the doublons into Salesforce and iterate other this process until there is no more doublons.
Personally, if you can't just eliminate the doublons, I prefer the 2nd approach as it's the solution to have less Salesforce API calls.

Hope this helps.
TRF

Upvotes: 1

Related Questions