Abir Chokraborty
Abir Chokraborty

Reputation: 1765

What is the efficient way to remove subsets from a List[List[String]]?

I have a ListBuffer of List[String], val tList = ListBuffer[TCount] where TCount is case class TCount(l: List[String], c: Long). I want to find those list l from tList which are not the subset of any other element of tlist and their c value is less than their superset c value. The following program works but I have to use two for loop that makes the code inefficient. Is there any better approach I can use to make the code efficient?

    val _arr = tList.toArray

    for (i <- 0 to (_arr.length - 1)) {
      val il = _arr(i).l.toSet
      val ic = _arr(i).c
      for (j <- 0 to (_arr.length - 1)) {
        val jl = _arr(j).toSet
        val jc = _arr(j).c
        if (i != j && il.subsetOf(jl) && ic >= jc) { 
          tList.-=(_arr(i))
        }
      }
    }

Upvotes: 2

Views: 548

Answers (4)

Leo C
Leo C

Reputation: 22449

Here's one approach:

  1. Create an indexed Map for identifying the original List elements
  2. Turn Map of List-elements into Map of Sets (with index)
  3. Generate combinations of the Map elements and use a custom filter to capture the elements that are subset of others
  4. Remove those subset elements from the Map of Sets and retrieve remaining elements from the Map of Lists via the index

Sample code:

type TupIntSet = Tuple2[Int, Set[Int]]

def subsetFilter(ls: List[TupIntSet]): List[TupIntSet] = 
  if ( ls.size != 2 ) List.empty[TupIntSet] else
    if ( ls(0)._2 subsetOf ls(1)._2 ) List[TupIntSet]((ls(0)._1, ls(0)._2)) else
      if ( ls(1)._2 subsetOf ls(0)._2 ) List[TupIntSet]((ls(1)._1, ls(1)._2)) else
        List.empty[TupIntSet]

val tList = List(List(1,2), List(1,2,3), List(3,4,5), List(5,4,3), List(2,3,4), List(6,7))

val listMap = (Stream from 1).zip(tList).toMap
val setMap = listMap.map{ case (i, l) => (i, l.toSet) }

val tSubsets = setMap.toList.combinations(2).toSet.flatMap(subsetFilter)

val resultList = (setMap.toSet -- tSubsets).map(_._1).map(listMap.getOrElse(_, ""))
// resultList: scala.collection.immutable.Set[java.io.Serializable] =
//   Set(List(5, 4, 3), List(2, 3, 4), List(6, 7), List(1, 2, 3))

Upvotes: 0

Andrey Tyukin
Andrey Tyukin

Reputation: 44918

Here is a method that relies on a data structure somewhat similar to Set-Trie, but which stores more subsets explicitly. It provides worse compression, but is faster during lookup:

def findMaximal(lists: List[List[String]]): List[List[String]] = {

  import collection.mutable.HashMap

  class Node(
    var isSubset: Boolean = false, 
    val children: HashMap[String, Node] = HashMap.empty
  ) {
    def insert(xs: List[String], isSubs: Boolean): Unit = if (xs.isEmpty) {
      isSubset |= isSubs
    } else {
      var isSubsSubs = false || isSubs
      for (h :: t <- xs.tails) {
        children.getOrElseUpdate(h, new Node()).insert(t, isSubsSubs)
        isSubsSubs = true
      }
    }
    def isMaximal(xs: List[String]): Boolean = xs match {
      case Nil => children.isEmpty && !isSubset
      case h :: t => children(h).isMaximal(t)
    }
    override def toString: String = {
      if (children.isEmpty) "#"
      else children.flatMap{ 
        case (k,v) => {
          if (v.children.isEmpty) List(k)
          else (k + ":") :: v.toString.split("\n").map("  " + _).toList
        }
      }.mkString("\n")
    }
  }

  val listsWithSorted = for (x <- lists) yield (x, x.sorted)
  val root = new Node()
  for ((x, s) <- listsWithSorted) root.insert(s, false)

  // println(root)

  for ((x, s) <- listsWithSorted; if root.isMaximal(s)) yield x
}

Note that I'm allowed to do any kind of mutable nonsense inside the body of the method, because the mutable trie data structure never escapes the scope of the method, and can therefore not be inadvertently shared with another thread.

Here is an example with sets of characters (converted to lists of strings):

println(findMaximal(List(
  "ab", "abc", "ac", "abd",
  "ade", "efd", "adf", "bafd",
  "abd", "fda", "dba", "dbe"
).map(_.toList.map(_.toString))))

The output is:

List(
  List(a, b, c), 
  List(a, d, e), 
  List(e, f, d), 
  List(b, a, f, d), 
  List(d, b, e)
)

so indeed, the non-maximal elements ab, ac, abd, adf, fda and dba are eliminated.

And here is what my not-quite-set-trie data structure looks like (child nodes are indented):

e:
  f
b:
  e
  d:
    e
    f
  c
  f
d:
  e:
    f
  f
a:
  e
  b:
    d:
      f
    c
    f
  d:
    e
    f
  c
  f
c
f

Upvotes: 1

Joe K
Joe K

Reputation: 18424

Inspired by the set-trie comment:

import scala.collection.SortedMap

class SetTrie[A](val flag: Boolean, val children: SortedMap[A, SetTrie[A]])(implicit val ord: Ordering[A]) {
  def insert(xs: List[A]): SetTrie[A] = xs match {
    case Nil => new SetTrie(true, children)
    case a :: rest => {
      val current = children.getOrElse(a, new SetTrie[A](false, SortedMap.empty))
      val inserted = current.insert(rest)
      new SetTrie(flag, children + (a -> inserted))
    }
  }

  def containsSuperset(xs: List[A], strict: Boolean): Boolean = xs match {
    case Nil => !children.isEmpty || (!strict && flag)
    case a :: rest => {
      children.get(a).map(_.containsSuperset(rest, strict)).getOrElse(false) ||
        children.takeWhile(x => ord.lt(x._1, a)).exists(_._2.containsSuperset(xs, false))
    }
  }
}

def removeSubsets[A : Ordering](xss: List[List[A]]): List[List[A]] = {
  val sorted = xss.map(_.sorted)
  val setTrie = sorted.foldLeft(new SetTrie[A](false, SortedMap.empty)) { case (st, xs) => st.insert(xs) }
  sorted.filterNot(xs => setTrie.containsSuperset(xs, true))
}

Upvotes: 1

Gabriel Francisco
Gabriel Francisco

Reputation: 350

Not sure if you can avoid the complexity, but, I guess I'd write like this:

val tList = List(List(1, 2, 3), List(3, 2, 1), List(9, 4, 7), List(3, 5, 6), List(1, 5, 6), List(6, 1, 5))

val tSet = tList.map(_.toSet)
def result = tSet.filterNot { sub => tSet.count(_.subsetOf(sub)) > 1 }

Upvotes: 0

Related Questions