Chris
Chris

Reputation: 33

Apache Flink DataSet API: How to merge a Flink DataSet with itself to a new one?

I have a single dimension DataSet of type String:

DataSet<String> x = //['dog','cat','sheep']

I want to compare all Strings with the other ones of this DataSet to analyse different string similarity algorithms. Therefore I need a resulting Dataset with the following Structure:

DataSet<Tuple2<String,String>> y = //[{'dog','cat'},{'dog','sheep'},{'cat','sheep'}]

On this DataSet a flatMap function (or similar) can be applied to compare the Strings.

My problem is that I don't know which Transformation I have to use. Maybe a Transformation is not the right way to handle that.

In plain Java I simple use two loops like this:

for(int i = 0; i < x.length() ; i++){
    for(int j = i+1 ; i< x.length(); j++){
        //do something with x[i] and x[j]
    }
}

Upvotes: 1

Views: 940

Answers (1)

Chesnay Schepler
Chesnay Schepler

Reputation: 1270

x.cross(x) should do the trick. this will execute a default-cross.

Upvotes: 1

Related Questions