extract or filter MapType of Spark DataFrame

Question

I have a DataFrame that contains various columns. One column contains a Map[Integer,Integer[]]. It looks like { 2345 -> [1,34,2]; 543 -> [12,3,2,5]; 2 -> [3,4]} Now what I need to do is filter out some keys. I have a Set of Integers (javaIntSet) in Java with which I should filter such that

col(x).keySet.isin(javaIntSet)

ie. the above map should only contain the key 2 and 543 but not the other two and should look like {543 -> [12,3,2,5]; 2 -> [3,4]} after filtering.

Documentation of how to use the Java Column Class is sparse. How do I extract the col(x) such that I can just filter it in java and then replace the cell data with a filtered map. Or are there any useful functions of columns I am overlooking. Can I write an UDF2,Set,Map I can write an UDF1 but I am not so sure how it works with more complex parameters.

Generally the javaIntSet is only a dozen and usually less than a 100 values. The Map usually also has only a handful entries (0-5 usually).

I have to do this in Java (unfortunately) but I am familiar with Scala. A Scala answer that I translate myself to Java would already be very helpful.

extract or filter MapType of Spark DataFrame

Answers (1)

Related Questions