Reputation: 171
I have an input dataframe(ip_df), data in this dataframe looks like as below:
id col_value
1 10
2 11
3 12
Data type of id and col_value is String
I need to get another dataframe(output_df), having datatype of id as string and col_value column as decimal**(15,4)**. THere is no data transformation, just data type conversion. Can i use it using PySpark. Any help will be appreciated
Upvotes: 10
Views: 32387
Reputation: 1377
You can change multiple column types
withColumn()
-from pyspark.sql.types import DecimalType, StringType
output_df = ip_df \
.withColumn("col_value", ip_df["col_value"].cast(DecimalType())) \
.withColumn("id", ip_df["id"].cast(StringType()))
select()
from pyspark.sql.types import DecimalType, StringType
output_df = ip_df.select(
(ip_df.id.cast(StringType())).alias('id'),
(ip_df.col_value.cast(DecimalType())).alias('col_value')
)
spark.sql()
ip_df.createOrReplaceTempView("ip_df_view")
output_df = spark.sql('''
SELECT
STRING(id),
DECIMAL(col_value)
FROM ip_df_view;
''')
Upvotes: 1
Reputation: 3110
try below statement.
output_df = ip_df.withColumn("col_value",ip_df["col_value"].cast('float'))
Upvotes: 5