Salim Fadhley
Salim Fadhley

Reputation: 8175

Can I write a scala function that returns an n-tuple where n is defined by an argument?

I'm trying to write a scala function that generates n-grams from tweets.

The function will take two arguments, firstly a list of strings (the tweets we want to examine), and an integer n. If we set n to 2 (the default), then the result of the function will be a HashMultiset of 2-tuples, likewise if we set it to 3 then the result will be a HashMultiset of 3-tuples.

Is there any way to define such a function? I'd like to be explicit with my typing, so I'd prefer not to just define the function as returning a MultiSet of Any.

Here's the stub function I have so far, it only works for n==2:

def extract_ngrams(tweets:List[String], n:Int=2):HashMultiset[(String,String)] = {
val result = HashMultiset.create[(String,String)]()
result.add(("a", "a"))
result
}

Upvotes: 3

Views: 924

Answers (3)

Rich Henry
Rich Henry

Reputation: 1849

I would suggest that a tuple may be the wrong data structure. A case class could solve this neatly:

case class Data(v: String*)

def makeData(v: String*) = {
  Data(v: _*)
}

val s = Set[Data]()

s += makeData("a", "b")
s += makeData("c", "d", "e")

for(i <- s) i match {
  case Data(v @ _*) => println(v)
}

Upvotes: 1

Alex Pakka
Alex Pakka

Reputation: 9706

Tuples in Scala go only up to 22. So even if there was a possibility it would only allow n values of 2..22.

I would instead simply return HashMultiset[Array[String]] and you can use n to define your result arity: val result = HashMultiset[Array[String]].create()

You could then map it to tuples depending on the use case when needed.

Update

If I understand what you need, I would do something like

def extract_ngrams(tweets:List[String], n:Int=2):Map[List[String],Int] = {
   tweets.sliding(n).toList.groupBy(_.toList).mapValues(_.length)
}

Upvotes: 6

Ben Reich
Ben Reich

Reputation: 16324

This isn't possible with the native Scala libraries. You can use something like shapeless if this functionality is important to you.

The common supertype of the tuples you're describing is Product with Serializable, so you could return a HashMultiset[Product with Serializable] if you wanted to, but you're probably better of just returning a HashMultiset[Seq[String]] or HashMultiset[Map[Int, String]] or HashMultiset[Array[String]].

Upvotes: 5

Related Questions