Reputation: 35
I am new to spark and trying to play with data to get practice. I am using databricks in scala and for dataset I am using fifa 19 complete player dataset from kaggle. one of the column named "Weight" which contains the data that looks like
+------+
|Weight|
+------+
|136lbs|
|156lbs|
|136lbs|
|... |
|... |
+------+
I want to change the column in such a way to look like this
+------+
|Weight|
+------+
|136 |
|156 |
|136 |
|... |
|... |
+------+
Can any one help how I can change the column values in spark sql.
Upvotes: 1
Views: 9506
Reputation: 7316
Here is another way using regex and the regexp_extract build-in function:
import org.apache.spark.sql.functions.regexp_extract
val df = Seq(
"136lbs",
"150lbs",
"12lbs",
"30kg",
"500kg")
.toDF("weight")
df.withColumn("weight_num", regexp_extract($"weight", "\\d+", 0))
.withColumn("weight_unit", regexp_extract($"weight", "[a-z]+", 0))
.show
//Output
+------+----------+-----------+
|weight|weight_num|weight_unit|
+------+----------+-----------+
|136lbs| 136| lbs|
|150lbs| 150| lbs|
| 12lbs| 12| lbs|
| 30kg| 30| kg|
| 500kg| 500| kg|
+------+----------+-----------+
Upvotes: 4
Reputation: 3470
You can create a new column and use regexp_replace
dataFrame.withColumn("Weight2", regexp_replace($"Weight" , lit("lbs"), lit("")))
Upvotes: 1