Jonathan
Jonathan

Reputation: 73

Scala mapping and reducing to an Array

I have the following format on my data

{sentenceA1}{tab}{sentenceB1}  
{sentenceA2}{tab}{sentenceB1}  
{sentenceA3}{tab}{sentenceB2}  
{sentenceA4}{tab}{sentenceB2}  

and I want to get the array of sentencesA that match to B using Scala

[sentenceA1, sentenceA2]  
[sentenceA3, sentenceA4]

I tried the following

val file1 = file.map(line => line.split("\t"))
val file2 = file1.map(line => (line(1), line(0)))
file2.reduceLeft(_+_).collect

but its not successful

Upvotes: 0

Views: 505

Answers (3)

elm
elm

Reputation: 20435

Consider also a container class

case class Text(s:String) {
  val Array(a,b,_*) = s.split("\t") 
}

that splits each element in List[String]; thus

for ( (k,xs) <- lines.map(Text(_)).groupBy(_.b) ) yield k -> xs.map(_.a) 

delivers the desired association.

Upvotes: 0

mavarazy
mavarazy

Reputation: 7735

You can do it like this:

list.map(line => line.split("\t")).
    map(a => a(1) -> a(0)).
    groupBy(_._1).
    mapValues(_.map(_._2))

Or

list.map(line => line.split("\t")).
    groupBy(_(1)).
    mapValues(_.map(_(0)))

And you'll get a map

{SentenceB1} -> {sentenceA1, sentenceA2}
{SentenceB2} -> {sentenceA3, sentenceA4}

Upvotes: 4

Sergii Lagutin
Sergii Lagutin

Reputation: 10681

Read your lines from datasource (I use predefined list to simplify example):

val lines = List(
  "sentenceA1\tsentenceB1",
  "sentenceA2\tsentenceB1",
  "sentenceA3\tsentenceB2",
  "sentenceA4\tsentenceB2"
)

Process each line:

  • split by tab symbol.
  • group by second token
  • simplify grouped values by skipping second line token.

Code looks like this:

val result = lines
  .map(_.split("\t"))
  .groupBy(_(1))
  .mapValues( _.map(_(0)))

Upvotes: 1

Related Questions