Daniel
Daniel

Reputation: 2467

Scala: Splitting Lists into multiple parts

assume I have a list of tuples...

val l: List[(String, String, Date)] 

...this list gets sorted by date...

val sorted = l.sortWith((a, b) => a._3 < b._3 )

And now I want to split this sorted list into multiple list. The split should happen between tuples where the date difference is greater then 3 days. What would be a good and performant way to do that?

Thanks and regards!

EDIT:

Here is an example: Input (already sorted):

List(("a1", "b1", "2016-01-30"), ("a2", "b2", "2016-02-01"), ("a3", "b3", "2016-02-20"), ("a4", "b4", "2016-02-23"), ("a5", "b5", "2016-02-25"))

Expected output:

List(List(("a1", "b1", "2016-01-30"), ("a2", "b2", "2016-02-01")), List(("a3", "b3", "2016-02-20"), ("a4", "b4", "2016-02-23"), ("a5", "b5", "2016-02-25")))

Upvotes: 3

Views: 2677

Answers (4)

jwvh
jwvh

Reputation: 51271

Gee, if this is a party I might as well throw my 2-cents around.

sorted.init.foldRight(List(List(sorted.last))){ (tup,acc) => 
  if (acc.head.head._3 - tup._3 > /*test for time-gap here*/) 
    List(tup)::acc  // gap too big, start new sub-List
  else
    (tup::acc.head)::acc.tail  // prepend to current sub-List
}

Upvotes: 3

dhg
dhg

Reputation: 52701

Here is a very clean linear-time (single-pass) approach:

type Date = Int  // For simplicity in the example

val sorted: List[(String,String,Date)] = List(("a1", "b1", 1),
                                              ("a1", "b1", 2),
                                              ("a1", "b1", 6),
                                              ("a1", "b1", 8),
                                              ("a1", "b1", 10),
                                              ("a1", "b1", 15),
                                              ("a1", "b1", 16))

val result = sorted.sliding(2).foldLeft(Vector(Vector(sorted.head))) { 
  case (z, List(t1, t2)) => 
    if (t2._3 - t1._3 > 3) z :+ Vector(t2)
    else                   z.init :+ (z.last :+ t2)
}

result.foreach(println)
// Vector((a1,b1,1), (a1,b1,2))
// Vector((a1,b1,6), (a1,b1,8), (a1,b1,10))
// Vector((a1,b1,15), (a1,b1,16))

You can handle the special cases where sorted.length() < 2 separately.

Upvotes: 0

Iadams
Iadams

Reputation: 546

For convenience, I've substitued Ints for Dates, but the prinicpal is the same.

   val data = List(
      ("a","a", 10),
      ("a","b", 30),
      ("a","b", 11),
      ("a","b", 33),
      ("s","c", 37),
      ("a","c", 26),
      ("a","d", 22),
      ("m","a", 18),
      ("t","a", 15)
    )
    val sortedData = data.sortWith ((a,b)=> a._3 < b._3)
    println(s"$sortedData")
    val splitPass = sortedData.foldLeft(List[(String, String,Int)](), List[List[(String, String,Int)]](), sortedData.head._3){
      case ((cl, acc, ld),nt) =>
        if (nt._3-ld>3)
          (List(nt), cl.reverse ::acc, nt._3)
        else
          (nt:: cl, acc, nt._3)
    }
    val (fl, fa, _) = splitPass
    val res = (if (fl.isEmpty) fa else fl :: fa).reverse
    println(s"$res")

This gives the sorted list:

List((a,a,10), (a,b,11), (t,a,15), (m,a,18), (a,d,22), (a,c,26), (a,b,30), (a,b,33), (s,c,37))

and the result list:

List(List((a,a,10), (a,b,11)), List((t,a,15), (m,a,18)), List((a,d,22)), List((a,c,26)), List((a,b,30), (a,b,33)), List((s,c,37)))

What this does is a single pass through the sorted list, building an accumulator consisting of (List of items in the current group, List of completed groups, Int[Date] of last added item). This is seeded with two empty lists and the Date of the first item in the list.

For each item in the source list, if it's close to the previous one, it gets added to the Current group, if it's far from the previous item, the current group is closed and added to the completed gorups list, and the current item becoes the first item in a new list, and the current item's date becomes the reference date for the next check. If you wanted to break where the date was different to far from the earliest in the current group, it is easy to change the else branch to pass ld instead of nt._3.

At the end, you need to add any uncompleted group to the final collection of groups.

The two '.reverse's are necessary because the lists are, in the typcial functional style, built backwards (becuase it's cheaper that way) and reversed at completion.

Upvotes: 0

devkat
devkat

Reputation: 1644

I have simplified the types a bit, the code just illustrates the concept. The performance is probably better without the toList conversion.

type MyTuple = (String, Int)

val sorted: List[MyTuple] = List(
    ("a", 1),
    ("a", 2),
    ("a", 3),
    ("b", 7),
    ("b", 9),
    ("c", 13),
    ("c", 15),
    ("c", 16)
)

def test(a: MyTuple, b: MyTuple): Boolean = b._2 - a._2 > 3

// We prepend the head so that the first sliding pair will have distance 0
val lists = (sorted.head :: sorted)
  .sliding(2)
  .map { case List(a, b) => (b, test(a, b)) }
  .toList

def split(list: List[(MyTuple, Boolean)]): List[List[MyTuple]] = list match {
  case Nil => Nil
  case head :: tail => {
    val (l1, l2) = tail.span(!_._2)
    (head :: l1).map(_._1) :: split(l2)
  }
}

val splitLists = split(lists).map(_.map(_._1))
println(splitLists.mkString("\n"))

Output:

List(a, a, a)
List(b, b)
List(c, c, c)

Upvotes: 0

Related Questions