hlagvankar
hlagvankar

Reputation: 319

scala increment nested for comprehension

I am working on detecting PI/SI information within given dataset(spark). I have set of rules (in csv format) as below

Rule_No,Target,Pattern,Fuzzy_Match,EPDR,Category,Active
1,Name,name,true,PI - Name,General/ID,true
1,Name,identity,true,PI - Name,General/ID,true
1,Content,Smith,true,PI - Name,General/ID,true
1,Content,Jones,true,PI - Name,General/ID,true
1,Content,Williams,true,PI - Name,General/ID,true
5,Name,Gender,true,PI - Gender,General/ID,true
5,Content,M,false,PI - Gender,General/ID,true
5,Content,F,false,PI - Gender,General/ID,true
5,Content,Male,false,PI - Gender,General/ID,true
5,Content,Female,false,PI - Gender,General/ID,true

What I am trying to do is iterate over dataset columns and apply each of these rules to check whether particular column has PII or not. So say if I have column called name and given rule says scan the content of this column with pattern say Smith. If I found the match I will know this column is PI column and then move to next column and apply each and every rule until I find a match. I am using nested for comprehension to iterate over list of columns and list of rules. What I want is when I find a match I want to move to the next column instead of applying remaining rules. I have written code like this

for {
      c <- ds.columns.toList
      rule <- rules if rule.active && checkPII(ds, c, rule.target, rule.pattern, rule.fuzzyMatch)
    } yield {
      <return PII information>
    }

but this will apply every rule to same column even if it gets match. How can I move to next column instead of keep applying remaining rules?

Upvotes: 1

Views: 130

Answers (1)

Tim
Tim

Reputation: 27356

for turns into a map call which always checks every elements. You need to use collectFirst, which stops at the first match.

ds.columns.toList.flatMap { c =>
  rules.collectFirst {
    case rule if rule.active && checkPII(ds, c, rule.target, rule.pattern, rule.fuzzyMatch) =>
      <return PII information>
  }
}

Using flatMap means that it will discard failed matches and just return a list of matching values.

Upvotes: 2

Related Questions