Reputation: 135
I have a column of scores and I want to create a column of bands for this score.
Example: Score 623 should have a column with value (600, 625]
.
In python I used to do this like this:
df['score_band_25'] = df['score'].apply(lambda x: '('+str((int((x-0.1)//25)*25))+', '+str(int((x-0.1)//25)*25+25)+']')
How can I do this in Scala? The best I could come up with so far was:
.withColumn("score_band_25", floor(($"score"-0.1)/25)*25)
but that only gives me the first part of the tuple, and I can't find a way to unite floor(($"score"-0.1)/25)*25)
and floor(($"score"-0.1)/25)*25+25)
as strings.
Ideally I would like to keep having a (
at the beginning and a ]
at the end, but I would be happy with two (600, 625)
too.
Upvotes: 0
Views: 168
Reputation: 22449
If I understand your requirement correctly, you're trying to create a String-type column of half-open intervals corresponding to the values of an Integer column – in which case you can simply use concat()
to concatenate the calculated interval limits:
import org.apache.spark.sql.functions._
val df = Seq(
12, 149, 300, 623
).toDF("score")
val interval = 25
df.withColumn("score_band_" + interval, concat(
lit("("),
floor(($"score" - 0.1) / interval) * interval,
lit(", "),
floor(($"score" - 0.1) / interval) * interval + interval,
lit("]")
)).show
// +-----+-------------+
// |score|score_band_25|
// +-----+-------------+
// | 12| (0, 25]|
// | 149| (125, 150]|
// | 300| (275, 300]|
// | 623| (600, 625]|
// +-----+-------------+
Note that concat
takes Column
-type parameters, thus lit()
is needed for the literal strings (e.g. "("
).
Upvotes: 2