Metadata
Metadata

Reputation: 2083

Is there a way to filter out the elements of a List by checking them against elements of an Array in Scala?

I have a List in Scala:

val hdtList = hdt.split(",").toList
hdtList.foreach(println)
Output:
    forecast_id bigint,period_year bigint,period_num bigint,period_name string,drm_org string,ledger_id bigint,currency_code string,source_system_name string,source_record_type string,gl_source_name string,gl_source_system_name string,year string,period string

There is an array which is obtained from a dataframe and converting its column to array as below:

val partition_columns   = spColsDF.select("partition_columns").collect.flatMap(x => x.getAs[String](0).split(","))
partition_columns.foreach(println)
Output:
source_system_name
period_year

Is there a way to filter out the elements: source_system_name string, period_year bigint from hdtList by checking them against the elements in the Array: partition_columns and put them into new List. I am confused on applying filter/map on the right collections appropriately and compare them. Could anyone let me know how can I achieve that ?

Upvotes: 1

Views: 83

Answers (3)

Sebastian Celestino
Sebastian Celestino

Reputation: 1428

In your case you need to use filter, because you need to remove elements from hdtList.

Map is a function that transform elements, there is no way to remove elements from a collection using map. If you have a List of X elements, after map execution, you have X elements, not less, not more.

val newList = hdtList.filter( x => partition_columns.exists(x.startsWith) )

Be aware that the combination filter+exists between two List is an algorithm NxM. If your Lists are big, you will have a performance problem.

One way to solve that problem is using Sets.

Upvotes: 2

jwvh
jwvh

Reputation: 51271

It might be useful to have both lists: the hdt elements referenced in partition_columns, and the hdt elements that aren't.

val (pc
    ,notPc) = hdtList.partition( w =>
                      partition_columns.contains(w.takeWhile(_!=' ')))
//pc: List[String] = List(period_year bigint, source_system_name string)
//notPc: List[String] = List(forecast_id bigint, period_num bigint, ... etc.

Upvotes: 1

Roberto Bonvallet
Roberto Bonvallet

Reputation: 33399

Unless I'm misunderstanding the question, I think this is what you need:

val filtered = hdtList.filter { x =>
  !partition_columns.exists { col => x.startsWith(col) }
}

Upvotes: 2

Related Questions