Mahi
Mahi

Reputation: 403

How to extract the numeric part from a string column in spark and update same column value after mathematic operation

i am new to scala spark and trying to perform below operation a dataframe column I have a column with alpha numeric values and want to update those values based on mathematical operations,

    +--------------------------------------+
    |Error                                 |
    +--------------------------------------+
    |value: 0.25 Does not meet Requirements|
    |value: 0.5  Does not meet Requirements|
    |value: 0.75 Does not meet Requirements|
    |value: 0.66 Does not meet Requirements|
    |value: 0.34 Does not meet Requirements|
    +--------------------------------------+

i want to perform numeric operation (1- {Numeric values from String}) and update the column with new values.

for example i want the output to be like below

    +--------------------------------------+
    |Error                                 |
    +--------------------------------------+
    |value: 0.75 Does not meet Requirements|
    |value: 0.5  Does not meet Requirements|
    |value: 0.25 Does not meet Requirements|
    |value: 0.34 Does not meet Requirements|
    |value: 0.66 Does not meet Requirements|
    +--------------------------------------+

any help would be appreciated, i learn about with column method using regular expression but to perform mathematical operations i am not getting any lead.

Regards Mahi

Upvotes: 0

Views: 270

Answers (1)

Duelist
Duelist

Reputation: 1572

Let's assume you have more than one column:

+------+--------------------+
|  col1|               Error|
+------+--------------------+
| first|value: 0.25 Does ...|
|second|value: 0.5  Does ...|
| third|value: 0.75 Does ...|
|fourth|value: 0.66 Does ...|
| fifth|value: 0.34 Does ...|
+------+--------------------+

You can update the column Error using split with mkString.

val subtractFromOne: Double => String = number =>
  (BigDecimal(1.0) - BigDecimal(number)).toString()

val transform: String => String = s => s.split(' ') match {
  case Array(first, number, rest@_*) =>
    (Seq(first, subtractFromOne(number.toDouble)) ++ rest).mkString(" ")
  case _ => s // in case if the string is invalid we can return it unchanged
}

implicit val enc: Encoder[Row] = RowEncoder(df.schema)

df
  .map(row => Row(row(0), transform(row.getString(1))))
  .show()

Will output:

+------+--------------------------------------+
|  col1|                                 Error|
+------+--------------------------------------+
| first|value: 0.75 Does not meet Requirements|
|second|value: 0.5  Does not meet Requirements|
| third|value: 0.25 Does not meet Requirements|
|fourth|value: 0.34 Does not meet Requirements|
| fifth|value: 0.66 Does not meet Requirements|
+------+--------------------------------------+

BigDecimal is used to keep the scale

Upvotes: 1

Related Questions