Reputation: 73
I have the following format on my data
{sentenceA1}{tab}{sentenceB1}
{sentenceA2}{tab}{sentenceB1}
{sentenceA3}{tab}{sentenceB2}
{sentenceA4}{tab}{sentenceB2}
and I want to get the array of sentencesA that match to B using Scala
[sentenceA1, sentenceA2]
[sentenceA3, sentenceA4]
I tried the following
val file1 = file.map(line => line.split("\t"))
val file2 = file1.map(line => (line(1), line(0)))
file2.reduceLeft(_+_).collect
but its not successful
Upvotes: 0
Views: 505
Reputation: 20435
Consider also a container class
case class Text(s:String) {
val Array(a,b,_*) = s.split("\t")
}
that splits each element in List[String]
; thus
for ( (k,xs) <- lines.map(Text(_)).groupBy(_.b) ) yield k -> xs.map(_.a)
delivers the desired association.
Upvotes: 0
Reputation: 7735
You can do it like this:
list.map(line => line.split("\t")).
map(a => a(1) -> a(0)).
groupBy(_._1).
mapValues(_.map(_._2))
Or
list.map(line => line.split("\t")).
groupBy(_(1)).
mapValues(_.map(_(0)))
And you'll get a map
{SentenceB1} -> {sentenceA1, sentenceA2}
{SentenceB2} -> {sentenceA3, sentenceA4}
Upvotes: 4
Reputation: 10681
Read your lines from datasource (I use predefined list to simplify example):
val lines = List(
"sentenceA1\tsentenceB1",
"sentenceA2\tsentenceB1",
"sentenceA3\tsentenceB2",
"sentenceA4\tsentenceB2"
)
Process each line:
Code looks like this:
val result = lines
.map(_.split("\t"))
.groupBy(_(1))
.mapValues( _.map(_(0)))
Upvotes: 1