Reputation: 2287
I am a newbie trying to learn Scala after using PHP for many years. Problem I am trying to solve is really simple but not sure how to do in Scala. Basically I am reading content from a file which have person first name, last name and grades. I need to read the file and sort the names by grades.
File is like this.
Aoe Samore 3.1
Boe Sbmore 2.2
Coe Scmore 3.9
Doe Sdmore 2.4
Eoe Semore 3.5
Foe Sfmore 2.6
Goe Sgmore 3.7
Hoe Shmore 2.9
Ioe Simore 3.1
Joe Sjmore 1.2
Koe Skmore 3.2
Loe Slmore 4.0
End result should be just display like this
Loe Slmore 4
Coe Scmore 3.9
Goe Sgmore 3.7
Eoe Semore 3.5
Koe Skmore 3.2
Aoe Samore 3.1
Ioe Simore 3.1
Hoe Shmore 2.9
Foe Sfmore 2.6
Doe Sdmore 2.4
Boe Sbmore 2.2
Joe Sjmore 1.2
This is how I did in PHP
$content = file_get_contents('grd.txt');
$lines = explode("\n",$content);
$grades_with_information = [];
foreach($lines as $line) {
$temp = explode(' ',$line);
$temp[2] = (float)$temp[2];
$grades_with_information[] = $temp;
}
usort($grades_with_information, function($a, $b) {
if ($a[2] == $b[2]) {
return 0;
}
return ($a[2] > $b[2]) ? -1 : 1;
});
foreach($grades_with_information as $grade_with_information){
echo $grade_with_information[0].' '
.$grade_with_information[1].' '
.$grade_with_information[2]
.'<br>';
}
How we about doing this in Scala so far I have done this
val source = scala.io.Source.fromFile("grd.txt")
val lines = source.getLines()
for (a <- lines) {
}
source.close()
Any suggestion/help/clues?
Upvotes: 0
Views: 175
Reputation: 6862
If this is throwaway code and you don't care how rows with equal grades are ordered, then
lines.toArray.sortBy( -_.split(' ')(2).toFloat )
where lines
is any Traversable[String]
for the input lines, e.g.
scala.io.Source.fromFile("grd.txt").getLines
will sort them. Note that it is more efficient to split on the space character ' '
than on a string containing a single space " "
. And because we negate the grade value for comparison we don't have to reverse the result after sorting.
But you would not want to get caught contributing code like that to a real project -- it is egregiously inefficient for large datasets, only partially orders the data, and isn't particularly clear.
Typically you'd want to sort rows with equal grades in some manner, e.g. in ascending order by last name and then first name. It's often easiest just to turn the data into tuples the way you want to sort them -- the compare
method for tuples compares piece by piece -- and then map the sorted tuples to the desired final form:
lines.map( _ split ' ' ).map{
case Array(first,last,grade) => ( -grade.toFloat, last, first )
}.toArray.sorted.map{ case (g,l,f) => (f,l,-g) }
This approach is also far more efficient than calling sortBy
with a function that calls toFloat
, since that would cause each value to be converted many times -- each time the value is compared to another. It's best to convert the data up front and then sort it if performance matters at all.
Of course the above could be made shorter by using x(0), x(1), and x(2) instead of destructuring the array into first, last, and grade, but it is much clearer with the name bindings.
We arranged to get a descending sort on the grade by negating the grade value, but that doesn't generalize to non-numeric data types. More generally you could supply your own compare
method via an explicit Ordering
, in which case you might as well create the tuples in their final form and write the method to deal with that (obviating that final map
to put the data in its final form):
lines.map( _ split ' ' ).map{
case Array(first,last,grade) => ( first, last, grade.toFloat )
}.toArray.sorted( {
type T = (String,String,Float) // first name, last name, grade
new Ordering[T] {
// grade DESCENDING, then last name, then first name
def compare( a:T, b:T ) = {
val cmp3 = b._3 compare a._3 // descending, so b first
if ( cmp3 == 0 ) {
val cmp2 = a._2 compare b._2
if ( cmp2 == 0 ) a._1 compare b._1 else cmp2
} else
cmp3
}
}
} )
If there are a lot of entries and this code is critical to performance, you should probably use Sorting.quickSort
to do a fast in-place sort on the array:
val grades = lines.map( _ split ' ' ).map{
case Array(first,last,grade) => ( first, last, grade.toFloat )
}.toArray
scala.util.Sorting.quickSort(grades)( {
type T = (String,String,Float)
new Ordering[T] {
// grade DESCENDING, then last name, then first name
def compare( a:T, b:T ) = {
val cmp3 = b._3 compare a._3 // descending, so b to a
if ( cmp3 == 0 ) {
val cmp2 = a._2 compare b._2
if ( cmp2 == 0 ) a._1 compare b._1 else cmp2
} else
cmp3
}
}
} )
This is better for large datasets for two reasons:
sorted
will get you when called on a sequence of objects, via java.util.Arrays.sort
). Note, however, that quicksort is not a stable sort! We don't need a stable sort here because we are comparing on all the fields, so we can use quicksort.sorted
method will make a copy of the array and sort that. That means that if you have millions of rows, you have two huge objects on the heap when you really only need one -- we don't need the unsorted data after we are done sorting. And large objects like this are usually allocated directly out of generational storage, where they will hang around for a while before getting collected. The in-place sort prevents that needless abuse of the heap, not to mention the time spent needlessly copying the data.The compare
method is ugly because we are just using tuples; if it were me I would define a case class to hold the data and map the data into instances of that:
case class Grade( first:String, last:String, grade:Float )
val grades = lines.map( _ split ' ' ).map{
case Array(first,last,grade) => Grade( first, last, grade.toFloat )
}.toArray
Then you could write the compare
method more cleanly:
scala.util.Sorting.quickSort(grades)(
new Ordering[Grade] {
def compare( a:Grade, b:Grade ) = {
val gradeCmp = b.grade compare a.grade // descending, so b to a
if ( gradeCmp == 0 ) {
val lastCmp = a.last compare b.last
if ( lastCmp == 0 ) a.first compare b.first else lastCmp
} else
gradeCmp
}
}
)
Upvotes: 2
Reputation: 3294
val lines = source.getLines().toSeq.sortBy(x=>x.split(" ")(2).toFloat).reverse
Upvotes: 3
Reputation: 20295
Paste the data with triple quotes.
"""Aoe Samore 3.1
Boe Sbmore 2.2
Coe Scmore 3.9
Doe Sdmore 2.4
Eoe Semore 3.5
Foe Sfmore 2.6
Goe Sgmore 3.7
Hoe Shmore 2.9
Ioe Simore 3.1
Joe Sjmore 1.2
Koe Skmore 3.2
Loe Slmore 4.0"""
Then do some string manipulation, sort by the second column, and reverse to get descending order.
scala> data.lines.map(s => s.split(" ")).toList.sortBy(row => row(2).toFloat).reverse
res16: List[Array[String]] = List(Array(Loe, Slmore, 4.0), Array(Coe, Scmore, 3.9), Array(Goe, Sgmore, 3.7), Array(Eoe, Semore, 3.5), Array(Koe, Skmore, 3.2), Array(Ioe, Simore, 3.1), Array(Aoe, Samore, 3.1), Array(Hoe, Shmore, 2.9), Array(Foe, Sfmore, 2.6), Array(Doe, Sdmore, 2.4), Array(Boe, Sbmore, 2.2), Array(Joe, Sjmore, 1.2))
To get this into your desired string format do some more string manipulation.
scala> res16.map(_.mkString(" ")).mkString("\n")
res17: String =
Loe Slmore 4.0
Coe Scmore 3.9
Goe Sgmore 3.7
Eoe Semore 3.5
Koe Skmore 3.2
Ioe Simore 3.1
Aoe Samore 3.1
Hoe Shmore 2.9
Foe Sfmore 2.6
Doe Sdmore 2.4
Boe Sbmore 2.2
Joe Sjmore 1.2
Upvotes: 1
Reputation: 38045
def getDouble(line: String) = line.split(" ").apply(2).toDouble
for (l <- lines.toSeq.sortBy(i => - getDouble(i)))
println(l)
Iterator
to collection in memory using toSeq
.Upvotes: 1