Reputation: 115
I have the following column that need to be transformerd into a decimal. Currently the column ent_Rentabiliteit_ent_rentabiliteit
is a string and I need to transform to a data type which returns the same values. So after transformation values such as -0.7
or -1.2
must be showed.
My current code looks currently like:
from pyspark.sql import functions as F
df2test = df2.withColumn(
"ent_Rentabiliteit_ent_rentabiliteit ",
F.regexp_replace("ent_Rentabiliteit_ent_rentabiliteit ", ".", ".").cast(
DecimalType()
),
)
Any suggestions? many thanks
Upvotes: 1
Views: 1300
Reputation: 2936
You could just cast
the column to DecimalType
.
Example:
spark = SparkSession.builder.getOrCreate()
data = [
{
"ent_Rentabiliteit_ent_rentabiliteit": "1.23",
},
{
"ent_Rentabiliteit_ent_rentabiliteit": "-2.33",
},
]
df = spark.createDataFrame(data)
df = df.withColumn(
"new_col", F.col("ent_Rentabiliteit_ent_rentabiliteit").cast(DecimalType(10, 4))
)
Result:
+-----------------------------------+-------+
|ent_Rentabiliteit_ent_rentabiliteit|new_col|
+-----------------------------------+-------+
| 1.23| 1.2300|
| -2.33|-2.3300|
+-----------------------------------+-------+
Schema:
root
|-- ent_Rentabiliteit_ent_rentabiliteit: string (nullable = true)
|-- new_col: decimal(10,4) (nullable = true)
Please keep in mind that DecimalType
must have fixed precision. When creating a DecimalType, the default precision and scale is (10, 0). When inferring schema from decimal.Decimal objects, it will be DecimalType(38, 18).
Please read more about precision here: DecimalType
Upvotes: 2