Reputation: 1389
I have a list of tuple list which have overlapping elements.
val tupLis:Seq[(List[(Integer,Char)],Int)] = null//data
I am trying to merge overlapping elements in the tuple list. Here is a code that i am working on which uses foldleft to merger overlapping tuple list from the list.The merging doesn't seam to be working as it misses out some elements of the tuple list.Each tuple list contains 4 tuples present in them. Each tuple list in the list often overlapping as they are generated from a bigger list using sliding function.
val alLis:Seq[(List[(Integer,Char)],Int)] = snGrMap.map(_._2).flatten.toList.sortBy(_._1.head._1)
val res = alLis.foldLeft(mutable.HashMap.empty[Int,(List[Integer],List[(Integer,Char)],Int)]) { (map, value) =>
if(map.size<=0){
map.put(0,(value._1.map(_._1),value._1,value._2))
}else{
val cads = map.filter(p=>value._1.intersect(p._2._2).size>=3)
if(cads.size>=1) {
cads.foreach { i =>
val cmnPos = i._2._1.intersect(value._1.map(_._1))
val cmnBase = i._2._2.filter(p=>cmnPos.contains(p._1)).intersect(value._1.filter(p=>cmnPos.contains(p._1)))
println(cmnBase.size,cmnPos.size,value._1, i._2._2)
if(cmnBase.size == cmnPos.size)
map.put(i._1,((i._2._1++value._1.map(_._1)).distinct,(i._2._2++value._1).distinct,i._2._3+value._2))
else
map.put(map.size,(value._1.map(_._1),value._1,value._2))
}
}else{
map.put(map.size,(value._1.map(_._1),value._1,value._2))
}
}
map
}
here is the example data which i am using:
(List((306,c), (328,g), (336,a), (346,g)),282)
(List((306,g), (328,c), (336,g), (346,a)),22)
(List((306,c), (328,c), (336,g), (346,a)),4)
(List((328,g), (336,a), (346,g), (348,t)),164)
(List((328,g), (336,a), (346,g), (348,c)),161)
(List((328,c), (336,g), (346,a), (348,c)),28)
(List((336,a), (346,g), (348,t), (358,a)),168)
(List((336,a), (346,g), (348,c), (358,a)),154)
(List((336,g), (346,a), (348,c), (358,g)),30)
(List((346,g), (348,t), (358,a), (361,c)),178)
(List((346,g), (348,c), (358,a), (361,c)),166)
(List((346,a), (348,c), (358,g), (361,g)),34)
The merged list look like:
List((306,c), (328,g), (336,a), (346,g), (348,t), (358,a), (361,c),792)
List((306,c), (328,g), (336,a), (346,g), (348,c), (358,a), (361,c) ),763)
List((306,g), (328,c), (336,g), (346,a), (348,c), (358,g), (361,g) ),96)
Update 1:
Overlap: If two list of tuples have the 3 or more exact tuples present in both list,then they are supposed to be overlapping list of tuples.But there should not be any difference when two list is merged.If one of the tuple values in both list have same integer but different char,then they shall not be merged. Merge: Combining two or more list of tuple list when they overlap.
Update 2: I have come up with a small solution,but not sure how efficient is it.
val alLisWithIndex = alLis.zipWithIndex
val interGrps = new ListBuffer[(Int,Int)]()
alLisWithIndex.foreach{i=>
val cads = alLisWithIndex.filter(p=>p._1._1.take(3).intersect(i._1._1.takeRight(3)).size>=3)
cads.foreach(p=>interGrps.append((i._2,p._2)))
}
println(interGrps.sortBy(_._1))
so when i print the above code, i get tuple list grouped in this manner. i have printed only the index of the each tuple group that should be merged.
Result generated: ListBuffer((0,2), (0,3), (1,4), (2,5), (3,6), (4,7), (5,8), (6,9), (7,10))
here is the list of tuples with their index that was used
List(((List((306,c), (328,g), (336,a), (346,g)),282),0),
((List((306,g), (328,c), (336,g), (346,a)),22),1),
((List((328,g), (336,a), (346,g), (348,t)),164),2),
((List((328,g), (336,a), (346,g), (348,c)),161),3),
((List((328,c), (336,g), (346,a), (348,c)),28),4),
((List((336,a), (346,g), (348,t), (358,a)),168),5),
((List((336,a), (346,g), (348,c), (358,a)),154),6),
((List((336,g), (346,a), (348,c), (358,g)),30),7),
((List((346,g), (348,t), (358,a), (361,c)),178),8),
((List((346,g), (348,c), (358,a), (361,c)),166),9),
((List((346,a), (348,c), (358,g), (361,g)),34),10))
So now all i had to do was use the interGrps
,link the groups based on second value and finally replace the indexes with list of tuples..
Upvotes: 0
Views: 1036
Reputation: 41779
The following code follows, I think, the description of your algorithm. However, it does not give the same output so there's something still to clear up about wha you want
First, the test data
var xs = List(
(List((306,"c"), (328,"g"), (336,"a"), (346,"g")),282),
(List((306,"g"), (328,"c"), (336,"g"), (346,"a")),22),
(List((306,"c"), (328,"c"), (336,"g"), (346,"a")),4),
(List((328,"g"), (336,"a"), (346,"g"), (348,"t")),164),
(List((328,"g"), (336,"a"), (346,"g"), (348,"c")),161),
(List((328,"c"), (336,"g"), (346,"a"), (348,"c")),28),
(List((336,"a"), (346,"g"), (348,"t"), (358,"a")),168),
(List((336,"a"), (346,"g"), (348,"c"), (358,"a")),154),
(List((336,"g"), (346,"a"), (348,"c"), (358,"g")),30),
(List((346,"g"), (348,"t"), (358,"a"), (361,"c")),178),
(List((346,"g"), (348,"c"), (358,"a"), (361,"c")),166),
(List((346,"a"), (348,"c"), (358,"g"), (361,"g")),34))
Now a method to implement " If two list of tuples have the 3 or more exact tuples present in both list,then they are supposed to be overlapping list of tuples. "
def isOverlap[A](a:(List[A],Int),b:(List[A],Int)) = (a._1 intersect b._1).size >= 3
Then, using something I wrote over here that groups elements that "match" according to the predicate
def groupWith[A](xs: List[A], f: (A, A) => Boolean) = {
// helper function to add "e" to any list with a member that matches the predicate
// otherwise add it to a list of its own
def addtoGroup(gs: List[List[A]], e: A): List[List[A]] = {
val (before, after) = gs.span(_.exists(!f(_, e)))
if (after.isEmpty)
List(e) :: gs
else
before ::: (e :: after.head) :: after.tail
}
// now a simple foldLeft adding each element to the appropriate list
xs.foldLeft(Nil: List[List[A]])(addtoGroup)
}
we can get a list of lists of overlapping elements
List(List((List((346,g), (348,c), (358,a), (361,c)),166),
(List((346,g), (348,t), (358,a), (361,c)),178)),
List((List((346,a), (348,c), (358,g), (361,g)),34),
(List((336,g), (346,a), (348,c), (358,g)),30)),
List((List((336,a), (346,g), (348,c), (358,a)),154),
(List((336,a), (346,g), (348,t), (358,a)),168)),
List((List((328,c), (336,g), (346,a), (348,c)),28),
(List((306,c), (328,c), (336,g), (346,a)),4),
(List((306,g), (328,c), (336,g), (346,a)),22)),
List((List((328,g), (336,a), (346,g), (348,c)),161),
(List((328,g), (336,a), (346,g), (348,t)),164),
(List((306,c), (328,g), (336,a), (346,g)),282)))
Then we write a function to merge a list of overlapping tuples:
def merge(ys: List[(List[(Int, String)], Int)]) =
ys.foldLeft((Nil:List[(Int, String)], 0))
{(acc, e) => ((acc._1 ++ (e._1 diff acc._1)).sorted, acc._2 + e._2)}
(doing a union of the tuples by adding any that aren't already in the accumulated result, and adding up the ints. The .sorted
is just to make it easier to visually review the result)
Then merge the overlapping entries
ms.map(merge)
Giving this, but that isn't the output you have?
List((List((346,g), (348,c), (348, t), (358,a), (361,c)),344),
(List((336,g), (346,a), (348,c), (358,g), (361, g)),64),
(List((336,a), (346,g), (348,c), (348,t), (358,a)),322),
(List((306,c), (306,g), (328,c), (336,g), (346,a), (348,c)),54),
(List((306,c), (328,g), (336,a), (346,g), (348,c), (348,t)),607))
EDIT: Following the comments, here's an updated isOverlap. However, it means less overlaps than the original, so more elements in the final merged output, so it's still not right:
def isOverlap(a:(List[(Int, String)],Int),b:(List[(Int, String)],Int)) =
// combine the tuples by Int, and check that we don't get two entries
// for any Int (i.e. if we do, they have different Strings so it's not an overlap)
!((a._1++b._1).groupBy(_._2).exists(_._2.length > 1)) &&
// check there are at least 2 matching tuples
(a._1 intersect b._1).size >= 3
Upvotes: 2