Arnoux Olivier
Arnoux Olivier

Reputation: 47

REGEX in Pentaho to clean a join column in my data

I have been struggling with a certain column in my data where the source data is dirty and i cant find joins because of this.

So What I am trying to do is:

  1. Select the column [website_reference_number] among others
  2. REGEX to review [website_reference_number] according to certain specs
  3. Then I need to trim that data so that there are no in-consistencies left so that my joins will be clean

In example

if [website_reference_number] = "CC-DE-109"                >>> Leave it like that

if [website_reference_number] = "CC-DE-109-Duplicate"      >>> change to CC-DE-109

if [website_reference_number] = "CC-DE-109 Duplicate"      >>> change to CC-DE-109

if [website_reference_number] = "CC-DE-109-Duplicate-Duplic" >>> change to CC-DE-109

So the rules are in human terms {Any 2 Letters}-{Any 2 Letters}-{AnyAmountOfNumbers}

Upvotes: 1

Views: 155

Answers (1)

Shafizadeh
Shafizadeh

Reputation: 10360

Use this pattern:

/([A-Z]{2})-([A-Z]{2})-([0-9]+).*/

Online Demo

Upvotes: 1

Related Questions