decoded051
decoded051

Reputation: 91

Is it possible to define the cluster name before spraying a file via ECL IDE?

So, I was working on spraying a DATASET object back on the cluster using the following technique:

myDS := DATASET([{{'James','Walters','C'},
                           {'Jessie','Blenger'},
                           {'Horatio','Walters'}},
                          {{'Anne','Winston'},
                           {'Sant','Aclause'},
                           {'Elfin','And'}}], personRecord);

//THE MOST COMMON TECHNIQUE USED FOR THIS IS:

OUTPUT(myDS,,'sprayed::target_scope',NAMED('FILE_HAS_BEEN_SPRAYED'), OVERWRITE);

I wanted to know if there is anyway if I can specify the name of the cluster I want to spray an ECL DATASET object(myDS in my case).

So, Initially I thought of using the functions STD.File.SprayDelimited as well as DFUPlus, however these are useful only for spraying a file(along with specifying cluster name) and not an ECL DATASET object. The DFU Command Line has a parameter called dstcluster, where I can specify the cluster name but again , that is just for a file.

Upvotes: 0

Views: 55

Answers (1)

Richard Taylor
Richard Taylor

Reputation: 780

First, your myDS DATASET definition is not an "object" (ECL is not object-oriented), it simply defines an inline set of records, in memory, that is treated the same as a file on disk.

Next, you are conflating two separate things:

  1. The OUTPUT action writes records from memory to disk. Since the HPCC Systems platform is a parallel processing platform, that means that each node writes the records in its memory to its own disk. That makes every dataset on disk a distributed set of physical files that are treated as a single logical entity.
  2. The De-spray action is designed to read those distributed file parts and write all the data in them to a single physical file on your designated Landing Zone. A Landing Zone is a single computer (not a "cluster") that is configured as part of your HPCC environment's middleware but is not part of any Thor or ROXIE cluster.

If what you want to do is to combine those two actions into a single step, then you need to look at the ECL Language Reference, in the Scope and Logical Filenames article there is a section called Landing Zone Files (on page 93 in the 9.4.24 release of the ECL Language Reference) that will tell you how to name the file in your OUTPUT action so that it is written directly to your Landing Zone as a single physical file.

That is not something I would recommend for very large datasets, since you would drastically decrease performance on your Thor cluster to do that. Remember, the de-spray action is done in a DFU workunit that doesn't directly involve the Thor cluster. So for very large datasets you will always want to OUTPUT the data to the distributed files parts (the default behavior) so that the de-spray can be done as a separate action, freeing the Thor cluster for other work.

Upvotes: 2

Related Questions