Reputation: 403
i am new to scala spark and trying to perform below operation a dataframe column I have a column with alpha numeric values and want to update those values based on mathematical operations,
+--------------------------------------+
|Error |
+--------------------------------------+
|value: 0.25 Does not meet Requirements|
|value: 0.5 Does not meet Requirements|
|value: 0.75 Does not meet Requirements|
|value: 0.66 Does not meet Requirements|
|value: 0.34 Does not meet Requirements|
+--------------------------------------+
i want to perform numeric operation (1- {Numeric values from String}) and update the column with new values.
for example i want the output to be like below
+--------------------------------------+
|Error |
+--------------------------------------+
|value: 0.75 Does not meet Requirements|
|value: 0.5 Does not meet Requirements|
|value: 0.25 Does not meet Requirements|
|value: 0.34 Does not meet Requirements|
|value: 0.66 Does not meet Requirements|
+--------------------------------------+
any help would be appreciated, i learn about with column method using regular expression but to perform mathematical operations i am not getting any lead.
Regards Mahi
Upvotes: 0
Views: 270
Reputation: 1572
Let's assume you have more than one column:
+------+--------------------+
| col1| Error|
+------+--------------------+
| first|value: 0.25 Does ...|
|second|value: 0.5 Does ...|
| third|value: 0.75 Does ...|
|fourth|value: 0.66 Does ...|
| fifth|value: 0.34 Does ...|
+------+--------------------+
You can update the column Error
using split
with mkString
.
val subtractFromOne: Double => String = number =>
(BigDecimal(1.0) - BigDecimal(number)).toString()
val transform: String => String = s => s.split(' ') match {
case Array(first, number, rest@_*) =>
(Seq(first, subtractFromOne(number.toDouble)) ++ rest).mkString(" ")
case _ => s // in case if the string is invalid we can return it unchanged
}
implicit val enc: Encoder[Row] = RowEncoder(df.schema)
df
.map(row => Row(row(0), transform(row.getString(1))))
.show()
Will output:
+------+--------------------------------------+
| col1| Error|
+------+--------------------------------------+
| first|value: 0.75 Does not meet Requirements|
|second|value: 0.5 Does not meet Requirements|
| third|value: 0.25 Does not meet Requirements|
|fourth|value: 0.34 Does not meet Requirements|
| fifth|value: 0.66 Does not meet Requirements|
+------+--------------------------------------+
BigDecimal
is used to keep the scale
Upvotes: 1