nicholasnet
nicholasnet

Reputation: 2287

How to sort lines on a value in them

I am a newbie trying to learn Scala after using PHP for many years. Problem I am trying to solve is really simple but not sure how to do in Scala. Basically I am reading content from a file which have person first name, last name and grades. I need to read the file and sort the names by grades.

File is like this.

Aoe Samore 3.1
Boe Sbmore 2.2
Coe Scmore 3.9
Doe Sdmore 2.4
Eoe Semore 3.5
Foe Sfmore 2.6
Goe Sgmore 3.7
Hoe Shmore 2.9
Ioe Simore 3.1
Joe Sjmore 1.2
Koe Skmore 3.2
Loe Slmore 4.0

End result should be just display like this

Loe Slmore 4
Coe Scmore 3.9
Goe Sgmore 3.7
Eoe Semore 3.5
Koe Skmore 3.2
Aoe Samore 3.1
Ioe Simore 3.1
Hoe Shmore 2.9
Foe Sfmore 2.6
Doe Sdmore 2.4
Boe Sbmore 2.2
Joe Sjmore 1.2

This is how I did in PHP

$content = file_get_contents('grd.txt');
$lines = explode("\n",$content);

$grades_with_information = [];

foreach($lines as $line) {
    $temp = explode(' ',$line);
    $temp[2] = (float)$temp[2];
    $grades_with_information[] = $temp;
}
usort($grades_with_information, function($a, $b) {
    if ($a[2] == $b[2]) {
        return 0;
    }
    return ($a[2] > $b[2]) ? -1 : 1;
});

foreach($grades_with_information as $grade_with_information){
    echo  $grade_with_information[0].' '
          .$grade_with_information[1].' '
          .$grade_with_information[2]
          .'<br>';

}

How we about doing this in Scala so far I have done this

val source = scala.io.Source.fromFile("grd.txt")
val lines = source.getLines()
for (a <- lines) {

}
source.close()

Any suggestion/help/clues?

Upvotes: 0

Views: 175

Answers (4)

AmigoNico
AmigoNico

Reputation: 6862

If this is throwaway code and you don't care how rows with equal grades are ordered, then

lines.toArray.sortBy( -_.split(' ')(2).toFloat )

where lines is any Traversable[String] for the input lines, e.g.

scala.io.Source.fromFile("grd.txt").getLines

will sort them. Note that it is more efficient to split on the space character ' ' than on a string containing a single space " ". And because we negate the grade value for comparison we don't have to reverse the result after sorting.

But you would not want to get caught contributing code like that to a real project -- it is egregiously inefficient for large datasets, only partially orders the data, and isn't particularly clear.

Typically you'd want to sort rows with equal grades in some manner, e.g. in ascending order by last name and then first name. It's often easiest just to turn the data into tuples the way you want to sort them -- the compare method for tuples compares piece by piece -- and then map the sorted tuples to the desired final form:

lines.map( _ split ' ' ).map{
  case Array(first,last,grade) => ( -grade.toFloat, last, first )
}.toArray.sorted.map{ case (g,l,f) => (f,l,-g) }

This approach is also far more efficient than calling sortBy with a function that calls toFloat, since that would cause each value to be converted many times -- each time the value is compared to another. It's best to convert the data up front and then sort it if performance matters at all.

Of course the above could be made shorter by using x(0), x(1), and x(2) instead of destructuring the array into first, last, and grade, but it is much clearer with the name bindings.

We arranged to get a descending sort on the grade by negating the grade value, but that doesn't generalize to non-numeric data types. More generally you could supply your own compare method via an explicit Ordering, in which case you might as well create the tuples in their final form and write the method to deal with that (obviating that final map to put the data in its final form):

lines.map( _ split ' ' ).map{
  case Array(first,last,grade) => ( first, last, grade.toFloat )
}.toArray.sorted( {
  type T = (String,String,Float)  // first name, last name, grade
  new Ordering[T] {
    // grade DESCENDING, then last name, then first name
    def compare( a:T, b:T ) = {
      val cmp3 = b._3 compare a._3  // descending, so b first
      if ( cmp3 == 0 ) {
        val cmp2 = a._2 compare b._2
        if ( cmp2 == 0 ) a._1 compare b._1 else cmp2
      } else
        cmp3
    }
  }
} )

If there are a lot of entries and this code is critical to performance, you should probably use Sorting.quickSort to do a fast in-place sort on the array:

val grades = lines.map( _ split ' ' ).map{
  case Array(first,last,grade) => ( first, last, grade.toFloat )
}.toArray

scala.util.Sorting.quickSort(grades)( {
  type T = (String,String,Float)
  new Ordering[T] {
    // grade DESCENDING, then last name, then first name
    def compare( a:T, b:T ) = {
      val cmp3 = b._3 compare a._3  // descending, so b to a
      if ( cmp3 == 0 ) {
        val cmp2 = a._2 compare b._2
        if ( cmp2 == 0 ) a._1 compare b._1 else cmp2
      } else
        cmp3
    }
  }
} )

This is better for large datasets for two reasons:

  1. The quicksort algorithm is faster than mergesort (which is what sorted will get you when called on a sequence of objects, via java.util.Arrays.sort). Note, however, that quicksort is not a stable sort! We don't need a stable sort here because we are comparing on all the fields, so we can use quicksort.
  2. The sorted method will make a copy of the array and sort that. That means that if you have millions of rows, you have two huge objects on the heap when you really only need one -- we don't need the unsorted data after we are done sorting. And large objects like this are usually allocated directly out of generational storage, where they will hang around for a while before getting collected. The in-place sort prevents that needless abuse of the heap, not to mention the time spent needlessly copying the data.

The compare method is ugly because we are just using tuples; if it were me I would define a case class to hold the data and map the data into instances of that:

case class Grade( first:String, last:String, grade:Float )
val grades = lines.map( _ split ' ' ).map{
  case Array(first,last,grade) => Grade( first, last, grade.toFloat )
}.toArray

Then you could write the compare method more cleanly:

scala.util.Sorting.quickSort(grades)(
  new Ordering[Grade] {
    def compare( a:Grade, b:Grade ) = {
      val gradeCmp = b.grade compare a.grade  // descending, so b to a
      if ( gradeCmp == 0 ) {
        val lastCmp = a.last compare b.last
        if ( lastCmp == 0 ) a.first compare b.first else lastCmp
      } else
        gradeCmp
    }
  }
)

Upvotes: 2

Konstantin
Konstantin

Reputation: 3294

val lines = source.getLines().toSeq.sortBy(x=>x.split(" ")(2).toFloat).reverse

Upvotes: 3

Brian
Brian

Reputation: 20295

Paste the data with triple quotes.

"""Aoe Samore 3.1
Boe Sbmore 2.2
Coe Scmore 3.9
Doe Sdmore 2.4
Eoe Semore 3.5
Foe Sfmore 2.6
Goe Sgmore 3.7
Hoe Shmore 2.9
Ioe Simore 3.1
Joe Sjmore 1.2
Koe Skmore 3.2
Loe Slmore 4.0"""

Then do some string manipulation, sort by the second column, and reverse to get descending order.

scala> data.lines.map(s => s.split(" ")).toList.sortBy(row => row(2).toFloat).reverse

res16: List[Array[String]] = List(Array(Loe, Slmore, 4.0), Array(Coe, Scmore, 3.9), Array(Goe, Sgmore, 3.7), Array(Eoe, Semore, 3.5), Array(Koe, Skmore, 3.2), Array(Ioe, Simore, 3.1), Array(Aoe, Samore, 3.1), Array(Hoe, Shmore, 2.9), Array(Foe, Sfmore, 2.6), Array(Doe, Sdmore, 2.4), Array(Boe, Sbmore, 2.2), Array(Joe, Sjmore, 1.2))

To get this into your desired string format do some more string manipulation.

scala> res16.map(_.mkString(" ")).mkString("\n")
res17: String = 
Loe Slmore 4.0
Coe Scmore 3.9
Goe Sgmore 3.7
Eoe Semore 3.5
Koe Skmore 3.2
Ioe Simore 3.1
Aoe Samore 3.1
Hoe Shmore 2.9
Foe Sfmore 2.6
Doe Sdmore 2.4
Boe Sbmore 2.2
Joe Sjmore 1.2

Upvotes: 1

senia
senia

Reputation: 38045

def getDouble(line: String) = line.split(" ").apply(2).toDouble

for (l <- lines.toSeq.sortBy(i => - getDouble(i)))
  println(l)
  1. Convert Iterator to collection in memory using toSeq.
  2. Sort it by negated double value.
  3. Print lines or sorted collection.

Upvotes: 1

Related Questions