pranay kumar
pranay kumar

Reputation: 404

Process multiple input rows using UDJC step in PDI

I have a requirement to do the following. The input provided is as shown below

State city
NY    joe 
NY    jane 
LA    zorro 
LA    steve

and the output should be

State city
NY    joe, jane
LA    steve, zorro

How to make this possible? I guess I have to process multiple input rows but when I try to read a new row using the getRow() method for the second time it's throwing array-out-of-bounds exception.... Please help me regarding this.. TIA

Here's the logic I've tried...

String cities;
Object[][] r = new Object[10][];
static int j;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{

for(;j<10||(r[j]=getRow())!=null;j++)
{
return true;
}

cities=get(Fields.In, "CITY").getString(r[0]);

int i;
for(i=1;r[i]!=null;i++)
{
if(get(Fields.In, "STATE").getString(r[i-1])==get(Fields.In, "STATE").getString(r[i])){

cities=cities+","+get(Fields.In, "CITY").getString(r[i]);
}
else{
Object[] outputRow = createOutputRow(new Object[4],1);
get(Fields.Out, "STATE").setValue(outputRow, get(Fields.In, "STATE").getString(r[i-1]));
get(Fields.Out, "CITIES").setValue(outputRow, cities);
putRow(data.outputRowMeta, outputRow);
cities=get(Fields.In, "CITY").getString(r[i]);
}
}
Object[] outputRow = createOutputRow(new Object[4],1);
get(Fields.Out, "STATE").setValue(outputRow, get(Fields.In, "STATE").getString(r[i-1]));
get(Fields.Out, "CITIES").setValue(outputRow, cities);
putRow(data.outputRowMeta, outputRow);

return false;
}

Upvotes: 0

Views: 1121

Answers (1)

Brian.D.Myers
Brian.D.Myers

Reputation: 2518

You don't need a UDJC for that. Sort your input set by "State" with either a Sort Rows step, or an ORDER BY clause in the SQL if using a Table Input step. Use a Group By step and set State for your Group Field, then specify "city" in the Subject field for your aggregate and use Concatenate strings separated by , for your aggregate type.

Upvotes: 1

Related Questions